{"title":"Accelerating Gaussian beam tracing method with dynamic parallelism on graphics processing units","authors":"Sheng Zhang , Lishu Duan , Hanbo Jiang","doi":"10.1016/j.cpc.2025.109722","DOIUrl":null,"url":null,"abstract":"<div><div>This study presents an efficient implementation of the Gaussian beam tracing (GBT) method utilizing graphics processing units (GPUs) to overcome the performance limitations of traditional CPU-based acoustic simulations. The algorithm was implemented and optimized on an NVIDIA RTX A6000 GPU, significantly enhancing the Gaussian beam summation (GBS) performance. We addressed the challenge of irregular control flows inherent to GBT by leveraging CUDA's dynamic parallelism to effectively flatten and dispatch nested loops directly on the GPU. Additionally, a profiling-driven optimization workflow using NVIDIA Nsight Compute enabled targeted improvements, raising SM throughput from 22.27% to 33.32%, L1 cache throughput from 13.15% to 22.15%, and L2 cache throughput from 9.16% to 21.26%. Consequently, the GPU-accelerated GBS algorithm achieved up to an 817× speedup compared to the original single-threaded CPU implementation, while the full computational pipeline reached 112× acceleration in a city-environment scenario involving 16,384 rays. Furthermore, this study introduces innovative strategies for overcoming GPU memory limitations, enabling efficient processing of large-scale ray datasets beyond single-kernel constraints. Finally, we establish systematic performance evaluation methodologies critical for analyzing and tuning GPU-accelerated algorithms, laying a foundation for future enhancements and scalability improvements.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"315 ","pages":"Article 109722"},"PeriodicalIF":7.2000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525002243","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This study presents an efficient implementation of the Gaussian beam tracing (GBT) method utilizing graphics processing units (GPUs) to overcome the performance limitations of traditional CPU-based acoustic simulations. The algorithm was implemented and optimized on an NVIDIA RTX A6000 GPU, significantly enhancing the Gaussian beam summation (GBS) performance. We addressed the challenge of irregular control flows inherent to GBT by leveraging CUDA's dynamic parallelism to effectively flatten and dispatch nested loops directly on the GPU. Additionally, a profiling-driven optimization workflow using NVIDIA Nsight Compute enabled targeted improvements, raising SM throughput from 22.27% to 33.32%, L1 cache throughput from 13.15% to 22.15%, and L2 cache throughput from 9.16% to 21.26%. Consequently, the GPU-accelerated GBS algorithm achieved up to an 817× speedup compared to the original single-threaded CPU implementation, while the full computational pipeline reached 112× acceleration in a city-environment scenario involving 16,384 rays. Furthermore, this study introduces innovative strategies for overcoming GPU memory limitations, enabling efficient processing of large-scale ray datasets beyond single-kernel constraints. Finally, we establish systematic performance evaluation methodologies critical for analyzing and tuning GPU-accelerated algorithms, laying a foundation for future enhancements and scalability improvements.
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.