A multi-GPU based high-performance computing framework in elastodynamics simulation using octree meshes

IF 6.9 1区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Computer Methods in Applied Mechanics and Engineering Pub Date : 2025-01-10 DOI:10.1016/j.cma.2024.117723

Shayan Mohammadian, Ankit S. Kumar, Chongmin Song

{"title":"A multi-GPU based high-performance computing framework in elastodynamics simulation using octree meshes","authors":"Shayan Mohammadian, Ankit S. Kumar, Chongmin Song","doi":"10.1016/j.cma.2024.117723","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes a high-performance computing framework for large-scale elastodynamic analysis utilizing Graphics Processor Units (GPUs). The study adopts an octree algorithm for automatic mesh generation. The scaled boundary finite element method (SBFEM) is employed with the octree mesh, eliminating hanging nodes between octree cells with different sizes. This approach significantly reduces the computational cost and memory requirement by exploiting the limited number of master cells in a balanced octree grid, and is advantageous for GPU computation. The parallelization is achieved through mesh-partitioning techniques and message-passing-interface (MPI) directives, complemented by the NVIDIA Collective Communication Library (NCCL) for optimal point-to-point communication between GPUs in high-performance computing (HPC) facilities. The HPC framework is implemented for both explicit and implicit dynamic analysis. The preconditioned conjugate gradient method is employed for the equation solution in the implicit analysis. Numerical examples are presented for validation of the implementation and for demonstrating the capabilities of the GPU implementation. An image-based 3D model representing a portion of the Moon’s complex surface is simulated with a layered structure comprising of approximately 440 million degrees of freedom. Using the explicit solver, a speed-up of 865 is achieved on a single computational node equipped with eight NVIDIA A100 GPUs in parallel. A 3D virtual city comprising of approximately 61 million degrees of freedom is modelled using the implicit solver.</div></div>","PeriodicalId":55222,"journal":{"name":"Computer Methods in Applied Mechanics and Engineering","volume":"436 ","pages":"Article 117723"},"PeriodicalIF":6.9000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Methods in Applied Mechanics and Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045782524009794","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposes a high-performance computing framework for large-scale elastodynamic analysis utilizing Graphics Processor Units (GPUs). The study adopts an octree algorithm for automatic mesh generation. The scaled boundary finite element method (SBFEM) is employed with the octree mesh, eliminating hanging nodes between octree cells with different sizes. This approach significantly reduces the computational cost and memory requirement by exploiting the limited number of master cells in a balanced octree grid, and is advantageous for GPU computation. The parallelization is achieved through mesh-partitioning techniques and message-passing-interface (MPI) directives, complemented by the NVIDIA Collective Communication Library (NCCL) for optimal point-to-point communication between GPUs in high-performance computing (HPC) facilities. The HPC framework is implemented for both explicit and implicit dynamic analysis. The preconditioned conjugate gradient method is employed for the equation solution in the implicit analysis. Numerical examples are presented for validation of the implementation and for demonstrating the capabilities of the GPU implementation. An image-based 3D model representing a portion of the Moon’s complex surface is simulated with a layered structure comprising of approximately 440 million degrees of freedom. Using the explicit solver, a speed-up of 865 is achieved on a single computational node equipped with eight NVIDIA A100 GPUs in parallel. A 3D virtual city comprising of approximately 61 million degrees of freedom is modelled using the implicit solver.

查看原文本刊更多论文

基于多gpu的八叉树网格弹性动力学仿真高性能计算框架

本文提出了一种利用图形处理器单元（gpu）进行大规模弹性动力学分析的高性能计算框架。本研究采用八叉树算法自动生成网格。八叉树网格采用缩放边界有限元法（SBFEM），消除了不同大小的八叉树单元之间的挂节点。该方法利用平衡八叉树网格中有限的主单元，大大降低了计算成本和内存需求，有利于GPU计算。并行化是通过网格划分技术和消息传递接口（MPI）指令实现的，辅以NVIDIA集体通信库（NCCL），以实现高性能计算（HPC）设施中gpu之间的最佳点对点通信。HPC框架实现了显式和隐式动态分析。隐式分析中，采用预条件共轭梯度法求解方程。给出了数值实例来验证该实现，并演示了GPU实现的能力。一个基于图像的3D模型代表了月球复杂表面的一部分，模拟了一个由大约4.4亿个自由度组成的分层结构。使用显式求解器，在并行配置8个NVIDIA A100 gpu的单个计算节点上实现了865的加速。利用隐式求解器对一个包含约6100万个自由度的三维虚拟城市进行建模。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Methods in Applied Mechanics and Engineering 工程技术-工程：综合

CiteScore

12.70

自引率

15.30%

发文量

719

审稿时长

44 days

期刊介绍： Computer Methods in Applied Mechanics and Engineering stands as a cornerstone in the realm of computational science and engineering. With a history spanning over five decades, the journal has been a key platform for disseminating papers on advanced mathematical modeling and numerical solutions. Interdisciplinary in nature, these contributions encompass mechanics, mathematics, computer science, and various scientific disciplines. The journal welcomes a broad range of computational methods addressing the simulation, analysis, and design of complex physical problems, making it a vital resource for researchers in the field.