{"title":"基于可移植pgas的大规模gpu加速分支绑定算法","authors":"Guillaume Helbecque, Ezhilmathi Krishnasamy, Tiago Carneiro, Nouredine Melab, Pascal Bouvry","doi":"10.1002/cpe.70321","DOIUrl":null,"url":null,"abstract":"<p>The Branch-and-Bound (B&B) technique plays a key role in solving many combinatorial optimization problems, enabling efficient problem-solving and decision-making in a wide range of applications. It incrementally constructs a tree by building candidates to the solutions and abandoning a candidate as soon as it determines that it cannot lead to an optimal solution. With modern problems growing increasingly large, accelerating B&B algorithms through parallelization has become a critical challenge for handling large solution spaces. At the same time, modern parallel computing systems themselves are becoming larger, more heterogeneous, and more diverse, requiring programming approaches capable of effectively exploiting such complexity. To address these challenges, this work presents a GPU-accelerated B&B algorithm based on the Partitioned Global Address Space (PGAS) programming model, implemented using the Chapel language. The PGAS-based design is motivated by the high-level abstraction provided by this programming model, which favors programmability, whereas vendor-neutral GPU features of the Chapel language favor GPU portability. The algorithm uses a pool-based approach for generality and exploits a dynamic load balancing mechanism for performance scalability. Extensive experimentation on the N-Queens and permutation flowshop scheduling problems demonstrated both code performance and code portability of the proposed algorithm on several GPU architectures compared to optimized CUDA-based implementations. Moreover, the strong scaling efficiency of the proposed algorithm is investigated on a TOP500 pre-exascale supercomputer up to 1024 GPUs.</p>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 25-26","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.70321","citationCount":"0","resultStr":"{\"title\":\"Portable PGAS-Based GPU-Accelerated Branch-And-Bound Algorithms at Scale\",\"authors\":\"Guillaume Helbecque, Ezhilmathi Krishnasamy, Tiago Carneiro, Nouredine Melab, Pascal Bouvry\",\"doi\":\"10.1002/cpe.70321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The Branch-and-Bound (B&B) technique plays a key role in solving many combinatorial optimization problems, enabling efficient problem-solving and decision-making in a wide range of applications. It incrementally constructs a tree by building candidates to the solutions and abandoning a candidate as soon as it determines that it cannot lead to an optimal solution. With modern problems growing increasingly large, accelerating B&B algorithms through parallelization has become a critical challenge for handling large solution spaces. At the same time, modern parallel computing systems themselves are becoming larger, more heterogeneous, and more diverse, requiring programming approaches capable of effectively exploiting such complexity. To address these challenges, this work presents a GPU-accelerated B&B algorithm based on the Partitioned Global Address Space (PGAS) programming model, implemented using the Chapel language. The PGAS-based design is motivated by the high-level abstraction provided by this programming model, which favors programmability, whereas vendor-neutral GPU features of the Chapel language favor GPU portability. The algorithm uses a pool-based approach for generality and exploits a dynamic load balancing mechanism for performance scalability. Extensive experimentation on the N-Queens and permutation flowshop scheduling problems demonstrated both code performance and code portability of the proposed algorithm on several GPU architectures compared to optimized CUDA-based implementations. Moreover, the strong scaling efficiency of the proposed algorithm is investigated on a TOP500 pre-exascale supercomputer up to 1024 GPUs.</p>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"37 25-26\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.70321\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70321\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70321","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Portable PGAS-Based GPU-Accelerated Branch-And-Bound Algorithms at Scale
The Branch-and-Bound (B&B) technique plays a key role in solving many combinatorial optimization problems, enabling efficient problem-solving and decision-making in a wide range of applications. It incrementally constructs a tree by building candidates to the solutions and abandoning a candidate as soon as it determines that it cannot lead to an optimal solution. With modern problems growing increasingly large, accelerating B&B algorithms through parallelization has become a critical challenge for handling large solution spaces. At the same time, modern parallel computing systems themselves are becoming larger, more heterogeneous, and more diverse, requiring programming approaches capable of effectively exploiting such complexity. To address these challenges, this work presents a GPU-accelerated B&B algorithm based on the Partitioned Global Address Space (PGAS) programming model, implemented using the Chapel language. The PGAS-based design is motivated by the high-level abstraction provided by this programming model, which favors programmability, whereas vendor-neutral GPU features of the Chapel language favor GPU portability. The algorithm uses a pool-based approach for generality and exploits a dynamic load balancing mechanism for performance scalability. Extensive experimentation on the N-Queens and permutation flowshop scheduling problems demonstrated both code performance and code portability of the proposed algorithm on several GPU architectures compared to optimized CUDA-based implementations. Moreover, the strong scaling efficiency of the proposed algorithm is investigated on a TOP500 pre-exascale supercomputer up to 1024 GPUs.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.