GPU-native adaptive mesh refinement with application to lattice Boltzmann simulations

IF 3.4 2区物理与天体物理 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer Physics Communications Pub Date : 2025-02-12 DOI:10.1016/j.cpc.2025.109543

Khodr Jaber , Ebenezer E. Essel , Pierre E. Sullivan

{"title":"GPU-native adaptive mesh refinement with application to lattice Boltzmann simulations","authors":"Khodr Jaber , Ebenezer E. Essel , Pierre E. Sullivan","doi":"10.1016/j.cpc.2025.109543","DOIUrl":null,"url":null,"abstract":"<div><div>Adaptive Mesh Refinement (AMR) enables efficient computation of flows by providing high resolution in critical regions while allowing for coarsening in areas where fine detail is unnecessary. While early AMR software packages relied solely on CPU parallelization, the widespread adoption of heterogeneous computing systems has led to GPU-accelerated implementations. In these hybrid approaches, simulation data typically resides on the GPU, and mesh management and adaptation occur exclusively on the CPU, necessitating frequent data transfers between them. A more efficient strategy is to adapt and maintain the entire mesh structure exclusively on the GPU, eliminating these transfers. Because of its inherent parallelism, the Lattice Boltzmann Method (LBM) has been widely implemented in hybrid AMR frameworks. This work presents a GPU-native algorithm for AMR using a block-based forest of octrees approach, implemented in both two and three dimensions as open-source C++/CUDA code. The implementation includes a Lattice Boltzmann solver for weakly compressible flow, though the underlying grid refinement procedure is compatible with any solver operating on cell-centered block-based grids. The lid-driven cavity and flow past a square cylinder benchmarks validate the algorithm's effectiveness across multiple velocity sets in both single- and double-precision. Tests conducted on consumer and datacenter-grade GPUs demonstrate its versatility across different hardware platforms.</div><div>Link to repository: <span><span>https://github.com/KhodrJ/AGAL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"311 ","pages":"Article 109543"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525000463","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Adaptive Mesh Refinement (AMR) enables efficient computation of flows by providing high resolution in critical regions while allowing for coarsening in areas where fine detail is unnecessary. While early AMR software packages relied solely on CPU parallelization, the widespread adoption of heterogeneous computing systems has led to GPU-accelerated implementations. In these hybrid approaches, simulation data typically resides on the GPU, and mesh management and adaptation occur exclusively on the CPU, necessitating frequent data transfers between them. A more efficient strategy is to adapt and maintain the entire mesh structure exclusively on the GPU, eliminating these transfers. Because of its inherent parallelism, the Lattice Boltzmann Method (LBM) has been widely implemented in hybrid AMR frameworks. This work presents a GPU-native algorithm for AMR using a block-based forest of octrees approach, implemented in both two and three dimensions as open-source C++/CUDA code. The implementation includes a Lattice Boltzmann solver for weakly compressible flow, though the underlying grid refinement procedure is compatible with any solver operating on cell-centered block-based grids. The lid-driven cavity and flow past a square cylinder benchmarks validate the algorithm's effectiveness across multiple velocity sets in both single- and double-precision. Tests conducted on consumer and datacenter-grade GPUs demonstrate its versatility across different hardware platforms.

Link to repository: https://github.com/KhodrJ/AGAL.

查看原文本刊更多论文

gpu原生自适应网格细化及其在晶格玻尔兹曼模拟中的应用

自适应网格细化（AMR）通过在关键区域提供高分辨率，同时允许在不需要精细细节的区域进行粗化，从而实现高效的流量计算。虽然早期的AMR软件包仅依赖于CPU并行化，但异构计算系统的广泛采用导致了gpu加速实现。在这些混合方法中，模拟数据通常驻留在GPU上，网格管理和适应只发生在CPU上，需要在它们之间频繁传输数据。更有效的策略是在GPU上调整和维护整个网格结构，消除这些传输。由于其固有的并行性，晶格玻尔兹曼方法在混合AMR框架中得到了广泛的应用。这项工作提出了一种gpu原生的AMR算法，使用基于块的八叉树森林方法，在二维和三维上作为开源的c++ /CUDA代码实现。该实现包括一个用于弱可压缩流的晶格玻尔兹曼求解器，尽管底层网格细化过程与任何在以细胞为中心的基于块的网格上操作的求解器兼容。盖驱动的空腔和流过方形圆柱体的基准测试验证了该算法在单精度和双精度下跨越多个速度集的有效性。在消费者级和数据中心级gpu上进行的测试证明了它在不同硬件平台上的通用性。链接到存储库：https://github.com/KhodrJ/AGAL。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Physics Communications 物理-计算机：跨学科应用

CiteScore

12.10

自引率

3.20%

发文量

287

审稿时长

5.3 months

期刊介绍： The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper. Computer Programs in Physics (CPiP) These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged. Computational Physics Papers (CP) These are research papers in, but are not limited to, the following themes across computational physics and related disciplines. mathematical and numerical methods and algorithms; computational models including those associated with the design, control and analysis of experiments; and algebraic computation. Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.