Zi-Mo Liao, Liang-Bing Chen, Zhen-Hua Wan, Nan-Sheng Liu, Xi-Yun Lu
{"title":"GPU acceleration of four-way coupled PP-DNS for compressible particle-laden wall turbulence","authors":"Zi-Mo Liao, Liang-Bing Chen, Zhen-Hua Wan, Nan-Sheng Liu, Xi-Yun Lu","doi":"10.1016/j.ijmultiphaseflow.2024.104840","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents an efficient implementation of the four-way coupled point-particle direct numerical simulation (PP-DNS) for compressible particle-laden wall turbulence, utilizing the open-source finite-difference compressible Navier–Stokes solver, STREAmS. The proposed design integrates a GPU-based two-phase collision detection algorithm known as the spatial subdivision method, along with specialized storage and MPI communication strategies for Lagrangian particles on multi-GPU platforms. Specifically, a ‘page table’ like data structure is designed to store the particle information compactly and to enable highly parallelized packing and unpacking procedures for GPU-GPU data exchange. These advancements significantly reduce the computational cost of four-way coupled particle-laden flow simulations, enabling efficient simulations involving over <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>7</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> particles (an order of magnitude higher than that in the state-of-the-art simulations) on a single NVIDIA A100 GPU. To validate the proposed implementation, we perform simulations of compressible particle-laden wall-bounded turbulence using canonical configurations such as channel flows and zero-pressure-gradient boundary layers. The example results highlight the effects of inter-particle collisions and flow compressibility. Furthermore, we assess single-GPU performance and scalability by employing up to eight NVIDIA GPU devices. Even for four-way coupled simulations, the elapsed time per step scales approximately linearly with the number of particles (when the number of particles is large enough), and a parallel efficiency of 94.1% is achieved on 8 NVIDIA A100 GPUs.</p></div>","PeriodicalId":339,"journal":{"name":"International Journal of Multiphase Flow","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Multiphase Flow","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0301932224001198","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MECHANICS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents an efficient implementation of the four-way coupled point-particle direct numerical simulation (PP-DNS) for compressible particle-laden wall turbulence, utilizing the open-source finite-difference compressible Navier–Stokes solver, STREAmS. The proposed design integrates a GPU-based two-phase collision detection algorithm known as the spatial subdivision method, along with specialized storage and MPI communication strategies for Lagrangian particles on multi-GPU platforms. Specifically, a ‘page table’ like data structure is designed to store the particle information compactly and to enable highly parallelized packing and unpacking procedures for GPU-GPU data exchange. These advancements significantly reduce the computational cost of four-way coupled particle-laden flow simulations, enabling efficient simulations involving over particles (an order of magnitude higher than that in the state-of-the-art simulations) on a single NVIDIA A100 GPU. To validate the proposed implementation, we perform simulations of compressible particle-laden wall-bounded turbulence using canonical configurations such as channel flows and zero-pressure-gradient boundary layers. The example results highlight the effects of inter-particle collisions and flow compressibility. Furthermore, we assess single-GPU performance and scalability by employing up to eight NVIDIA GPU devices. Even for four-way coupled simulations, the elapsed time per step scales approximately linearly with the number of particles (when the number of particles is large enough), and a parallel efficiency of 94.1% is achieved on 8 NVIDIA A100 GPUs.
期刊介绍:
The International Journal of Multiphase Flow publishes analytical, numerical and experimental articles of lasting interest. The scope of the journal includes all aspects of mass, momentum and energy exchange phenomena among different phases such as occur in disperse flows, gas–liquid and liquid–liquid flows, flows in porous media, boiling, granular flows and others.
The journal publishes full papers, brief communications and conference announcements.