GPU acceleration of four-way coupled PP-DNS for compressible particle-laden wall turbulence

IF 3.6 2区工程技术 Q1 MECHANICS

International Journal of Multiphase Flow Pub Date : 2024-04-17 DOI:10.1016/j.ijmultiphaseflow.2024.104840

Zi-Mo Liao, Liang-Bing Chen, Zhen-Hua Wan, Nan-Sheng Liu, Xi-Yun Lu

{"title":"GPU acceleration of four-way coupled PP-DNS for compressible particle-laden wall turbulence","authors":"Zi-Mo Liao, Liang-Bing Chen, Zhen-Hua Wan, Nan-Sheng Liu, Xi-Yun Lu","doi":"10.1016/j.ijmultiphaseflow.2024.104840","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents an efficient implementation of the four-way coupled point-particle direct numerical simulation (PP-DNS) for compressible particle-laden wall turbulence, utilizing the open-source finite-difference compressible Navier–Stokes solver, STREAmS. The proposed design integrates a GPU-based two-phase collision detection algorithm known as the spatial subdivision method, along with specialized storage and MPI communication strategies for Lagrangian particles on multi-GPU platforms. Specifically, a ‘page table’ like data structure is designed to store the particle information compactly and to enable highly parallelized packing and unpacking procedures for GPU-GPU data exchange. These advancements significantly reduce the computational cost of four-way coupled particle-laden flow simulations, enabling efficient simulations involving over <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>7</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> particles (an order of magnitude higher than that in the state-of-the-art simulations) on a single NVIDIA A100 GPU. To validate the proposed implementation, we perform simulations of compressible particle-laden wall-bounded turbulence using canonical configurations such as channel flows and zero-pressure-gradient boundary layers. The example results highlight the effects of inter-particle collisions and flow compressibility. Furthermore, we assess single-GPU performance and scalability by employing up to eight NVIDIA GPU devices. Even for four-way coupled simulations, the elapsed time per step scales approximately linearly with the number of particles (when the number of particles is large enough), and a parallel efficiency of 94.1% is achieved on 8 NVIDIA A100 GPUs.</p></div>","PeriodicalId":339,"journal":{"name":"International Journal of Multiphase Flow","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Multiphase Flow","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0301932224001198","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MECHANICS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents an efficient implementation of the four-way coupled point-particle direct numerical simulation (PP-DNS) for compressible particle-laden wall turbulence, utilizing the open-source finite-difference compressible Navier–Stokes solver, STREAmS. The proposed design integrates a GPU-based two-phase collision detection algorithm known as the spatial subdivision method, along with specialized storage and MPI communication strategies for Lagrangian particles on multi-GPU platforms. Specifically, a ‘page table’ like data structure is designed to store the particle information compactly and to enable highly parallelized packing and unpacking procedures for GPU-GPU data exchange. These advancements significantly reduce the computational cost of four-way coupled particle-laden flow simulations, enabling efficient simulations involving over $O (1 0^{7})$ particles (an order of magnitude higher than that in the state-of-the-art simulations) on a single NVIDIA A100 GPU. To validate the proposed implementation, we perform simulations of compressible particle-laden wall-bounded turbulence using canonical configurations such as channel flows and zero-pressure-gradient boundary layers. The example results highlight the effects of inter-particle collisions and flow compressibility. Furthermore, we assess single-GPU performance and scalability by employing up to eight NVIDIA GPU devices. Even for four-way coupled simulations, the elapsed time per step scales approximately linearly with the number of particles (when the number of particles is large enough), and a parallel efficiency of 94.1% is achieved on 8 NVIDIA A100 GPUs.

Abstract Image

查看原文本刊更多论文

用 GPU 加速四向耦合 PP-DNS 的可压缩颗粒壁湍流

本文利用开源有限差分可压缩纳维-斯托克斯求解器 STREAmS，介绍了针对可压缩颗粒满布壁面湍流的四向耦合点-颗粒直接数值模拟（PP-DNS）的高效实现。拟议的设计集成了一种基于 GPU 的两相碰撞检测算法（即空间细分法），以及在多 GPU 平台上针对拉格朗日粒子的专门存储和 MPI 通信策略。具体来说，设计了一种类似于 "页表 "的数据结构来紧凑地存储粒子信息，并为 GPU-GPU 数据交换实现高度并行化的打包和解包程序。这些进步大大降低了四向耦合粒子流模拟的计算成本，使在单个英伟达 A100 GPU 上进行涉及超过 O(107) 个粒子的高效模拟（比最先进的模拟高出一个数量级）成为可能。为了验证所提出的实现方法，我们使用通道流和零压梯度边界层等典型配置对可压缩颗粒壁面湍流进行了模拟。示例结果突出显示了粒子间碰撞和流动可压缩性的影响。此外，我们还通过使用多达八个英伟达™（NVIDIA®）GPU设备评估了单GPU性能和可扩展性。即使是四向耦合模拟，每一步的耗时也与粒子数量（当粒子数量足够大时）呈近似线性关系，在 8 个英伟达 A100 GPU 上实现了 94.1% 的并行效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Multiphase Flow 物理-力学

CiteScore

7.30

自引率

10.50%

发文量

244

审稿时长

4 months

期刊介绍： The International Journal of Multiphase Flow publishes analytical, numerical and experimental articles of lasting interest. The scope of the journal includes all aspects of mass, momentum and energy exchange phenomena among different phases such as occur in disperse flows, gas–liquid and liquid–liquid flows, flows in porous media, boiling, granular flows and others. The journal publishes full papers, brief communications and conference announcements.

文献相关原料

公司名称	产品信息	采购帮参考价格