Portable PGAS-Based GPU-Accelerated Branch-And-Bound Algorithms at Scale

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2025-10-02 DOI:10.1002/cpe.70321

Guillaume Helbecque, Ezhilmathi Krishnasamy, Tiago Carneiro, Nouredine Melab, Pascal Bouvry

{"title":"Portable PGAS-Based GPU-Accelerated Branch-And-Bound Algorithms at Scale","authors":"Guillaume Helbecque, Ezhilmathi Krishnasamy, Tiago Carneiro, Nouredine Melab, Pascal Bouvry","doi":"10.1002/cpe.70321","DOIUrl":null,"url":null,"abstract":"<p>The Branch-and-Bound (B&B) technique plays a key role in solving many combinatorial optimization problems, enabling efficient problem-solving and decision-making in a wide range of applications. It incrementally constructs a tree by building candidates to the solutions and abandoning a candidate as soon as it determines that it cannot lead to an optimal solution. With modern problems growing increasingly large, accelerating B&B algorithms through parallelization has become a critical challenge for handling large solution spaces. At the same time, modern parallel computing systems themselves are becoming larger, more heterogeneous, and more diverse, requiring programming approaches capable of effectively exploiting such complexity. To address these challenges, this work presents a GPU-accelerated B&B algorithm based on the Partitioned Global Address Space (PGAS) programming model, implemented using the Chapel language. The PGAS-based design is motivated by the high-level abstraction provided by this programming model, which favors programmability, whereas vendor-neutral GPU features of the Chapel language favor GPU portability. The algorithm uses a pool-based approach for generality and exploits a dynamic load balancing mechanism for performance scalability. Extensive experimentation on the N-Queens and permutation flowshop scheduling problems demonstrated both code performance and code portability of the proposed algorithm on several GPU architectures compared to optimized CUDA-based implementations. Moreover, the strong scaling efficiency of the proposed algorithm is investigated on a TOP500 pre-exascale supercomputer up to 1024 GPUs.</p>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 25-26","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.70321","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70321","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The Branch-and-Bound (B&B) technique plays a key role in solving many combinatorial optimization problems, enabling efficient problem-solving and decision-making in a wide range of applications. It incrementally constructs a tree by building candidates to the solutions and abandoning a candidate as soon as it determines that it cannot lead to an optimal solution. With modern problems growing increasingly large, accelerating B&B algorithms through parallelization has become a critical challenge for handling large solution spaces. At the same time, modern parallel computing systems themselves are becoming larger, more heterogeneous, and more diverse, requiring programming approaches capable of effectively exploiting such complexity. To address these challenges, this work presents a GPU-accelerated B&B algorithm based on the Partitioned Global Address Space (PGAS) programming model, implemented using the Chapel language. The PGAS-based design is motivated by the high-level abstraction provided by this programming model, which favors programmability, whereas vendor-neutral GPU features of the Chapel language favor GPU portability. The algorithm uses a pool-based approach for generality and exploits a dynamic load balancing mechanism for performance scalability. Extensive experimentation on the N-Queens and permutation flowshop scheduling problems demonstrated both code performance and code portability of the proposed algorithm on several GPU architectures compared to optimized CUDA-based implementations. Moreover, the strong scaling efficiency of the proposed algorithm is investigated on a TOP500 pre-exascale supercomputer up to 1024 GPUs.

Abstract Image

查看原文本刊更多论文

基于可移植pgas的大规模gpu加速分支绑定算法

分支定界（B&；B）技术在解决许多组合优化问题中起着关键作用，在广泛的应用中实现了高效的问题解决和决策。它通过构建解决方案的候选项，并在确定无法产生最优解决方案时立即放弃候选项，以增量方式构建树。随着现代问题变得越来越大，通过并行化加速B&；B算法已成为处理大型解决方案空间的关键挑战。同时，现代并行计算系统本身正在变得更大、更异构、更多样化，需要能够有效利用这种复杂性的编程方法。为了应对这些挑战，本研究提出了一种基于分区全局地址空间（PGAS）编程模型的gpu加速B&；B算法，该算法使用Chapel语言实现。基于pgas的设计是由该编程模型提供的高级抽象驱动的，这有利于可编程性，而Chapel语言的供应商中立GPU特性有利于GPU的可移植性。该算法使用基于池的方法实现通用性，并利用动态负载平衡机制实现性能可伸缩性。在N-Queens和置换flowshop调度问题上的大量实验证明了与优化的基于cuda的实现相比，所提出的算法在几种GPU架构上的代码性能和代码可移植性。此外，在TOP500前百亿亿次超级计算机（1024个gpu）上研究了该算法的强缩放效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.