A multi-GPU based high-performance computing framework in elastodynamics simulation using octree meshes

IF 6.9 1区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Shayan Mohammadian, Ankit S. Kumar, Chongmin Song
{"title":"A multi-GPU based high-performance computing framework in elastodynamics simulation using octree meshes","authors":"Shayan Mohammadian,&nbsp;Ankit S. Kumar,&nbsp;Chongmin Song","doi":"10.1016/j.cma.2024.117723","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes a high-performance computing framework for large-scale elastodynamic analysis utilizing Graphics Processor Units (GPUs). The study adopts an octree algorithm for automatic mesh generation. The scaled boundary finite element method (SBFEM) is employed with the octree mesh, eliminating hanging nodes between octree cells with different sizes. This approach significantly reduces the computational cost and memory requirement by exploiting the limited number of master cells in a balanced octree grid, and is advantageous for GPU computation. The parallelization is achieved through mesh-partitioning techniques and message-passing-interface (MPI) directives, complemented by the NVIDIA Collective Communication Library (NCCL) for optimal point-to-point communication between GPUs in high-performance computing (HPC) facilities. The HPC framework is implemented for both explicit and implicit dynamic analysis. The preconditioned conjugate gradient method is employed for the equation solution in the implicit analysis. Numerical examples are presented for validation of the implementation and for demonstrating the capabilities of the GPU implementation. An image-based 3D model representing a portion of the Moon’s complex surface is simulated with a layered structure comprising of approximately 440 million degrees of freedom. Using the explicit solver, a speed-up of 865 is achieved on a single computational node equipped with eight NVIDIA A100 GPUs in parallel. A 3D virtual city comprising of approximately 61 million degrees of freedom is modelled using the implicit solver.</div></div>","PeriodicalId":55222,"journal":{"name":"Computer Methods in Applied Mechanics and Engineering","volume":"436 ","pages":"Article 117723"},"PeriodicalIF":6.9000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Methods in Applied Mechanics and Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045782524009794","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes a high-performance computing framework for large-scale elastodynamic analysis utilizing Graphics Processor Units (GPUs). The study adopts an octree algorithm for automatic mesh generation. The scaled boundary finite element method (SBFEM) is employed with the octree mesh, eliminating hanging nodes between octree cells with different sizes. This approach significantly reduces the computational cost and memory requirement by exploiting the limited number of master cells in a balanced octree grid, and is advantageous for GPU computation. The parallelization is achieved through mesh-partitioning techniques and message-passing-interface (MPI) directives, complemented by the NVIDIA Collective Communication Library (NCCL) for optimal point-to-point communication between GPUs in high-performance computing (HPC) facilities. The HPC framework is implemented for both explicit and implicit dynamic analysis. The preconditioned conjugate gradient method is employed for the equation solution in the implicit analysis. Numerical examples are presented for validation of the implementation and for demonstrating the capabilities of the GPU implementation. An image-based 3D model representing a portion of the Moon’s complex surface is simulated with a layered structure comprising of approximately 440 million degrees of freedom. Using the explicit solver, a speed-up of 865 is achieved on a single computational node equipped with eight NVIDIA A100 GPUs in parallel. A 3D virtual city comprising of approximately 61 million degrees of freedom is modelled using the implicit solver.
基于多gpu的八叉树网格弹性动力学仿真高性能计算框架
本文提出了一种利用图形处理器单元(gpu)进行大规模弹性动力学分析的高性能计算框架。本研究采用八叉树算法自动生成网格。八叉树网格采用缩放边界有限元法(SBFEM),消除了不同大小的八叉树单元之间的挂节点。该方法利用平衡八叉树网格中有限的主单元,大大降低了计算成本和内存需求,有利于GPU计算。并行化是通过网格划分技术和消息传递接口(MPI)指令实现的,辅以NVIDIA集体通信库(NCCL),以实现高性能计算(HPC)设施中gpu之间的最佳点对点通信。HPC框架实现了显式和隐式动态分析。隐式分析中,采用预条件共轭梯度法求解方程。给出了数值实例来验证该实现,并演示了GPU实现的能力。一个基于图像的3D模型代表了月球复杂表面的一部分,模拟了一个由大约4.4亿个自由度组成的分层结构。使用显式求解器,在并行配置8个NVIDIA A100 gpu的单个计算节点上实现了865的加速。利用隐式求解器对一个包含约6100万个自由度的三维虚拟城市进行建模。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
12.70
自引率
15.30%
发文量
719
审稿时长
44 days
期刊介绍: Computer Methods in Applied Mechanics and Engineering stands as a cornerstone in the realm of computational science and engineering. With a history spanning over five decades, the journal has been a key platform for disseminating papers on advanced mathematical modeling and numerical solutions. Interdisciplinary in nature, these contributions encompass mechanics, mathematics, computer science, and various scientific disciplines. The journal welcomes a broad range of computational methods addressing the simulation, analysis, and design of complex physical problems, making it a vital resource for researchers in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信