{"title":"高阶MPI-Kokkos加速流体求解器的性能","authors":"Filipp Sporykhin , Holger Homann","doi":"10.1016/j.cpc.2025.109873","DOIUrl":null,"url":null,"abstract":"<div><div>This work discusses the performance of a modern numerical scheme for fluid dynamical problems on modern high-performance computing (HPC) architectures. Our code implements a spatial nodal discontinuous Galerkin (NDG) scheme that we test up to an order of convergence of eight. It is temporally coupled to a set of Runge-Kutta (RK) methods of orders up to six. The code integrates the linear advection equations as well as the isothermal Euler equations in one, two, and three dimensions. In order to target modern hardware involving many-core Central Processing Units (CPUs) and accelerators such as Graphic Processing Units (GPUs) we use the Kokkos library in conjunction with the Message Passing Interface (MPI) to run our single source code on various NVidia and AMD GPU systems.</div><div>By means of one- and two-dimensional simulations of simple test equations we find that the higher the order the faster is the code. Eighth-order simulations attain a given global error with much less computing time than third- or fourth-order simulations. The RK scheme has a smaller impact on the code performance and a classical fourth-order scheme seems to generally be a good choice.</div><div>The code performs very well on all considered HPC GPUs. We observe very good scaling properties up to 64 AMD MI250x GPUs and we show that the scaling properties are the same in two and three dimensions. The many-CPU performance is also very good and perfect weak scaling is observed up to many hundreds of CPU cores using MPI. We note that small grid-size simulations are faster on CPUs than on GPUs while GPUs win significantly over CPUs for simulations involving more than 10<sup>7</sup> degrees of freedom (<span><math><mo>≈</mo><msup><mrow><mn>3100</mn></mrow><mrow><mn>2</mn></mrow></msup></math></span> grid points). When it comes to the environmental impact of numerical simulations we estimate that GPUs consume less energy than CPUs for large grid-size simulations but more energy on small grids. Further, we observe a tendency that the more modern is the GPU the larger needs to be the grid in order to use it efficiently. This yields a rebound effect because larger simulations need longer computing times and in turn more energy that is not compensated by the energy efficiency gain of the newer GPUs.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"318 ","pages":"Article 109873"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of a high-order MPI-Kokkos accelerated fluid solver\",\"authors\":\"Filipp Sporykhin , Holger Homann\",\"doi\":\"10.1016/j.cpc.2025.109873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This work discusses the performance of a modern numerical scheme for fluid dynamical problems on modern high-performance computing (HPC) architectures. Our code implements a spatial nodal discontinuous Galerkin (NDG) scheme that we test up to an order of convergence of eight. It is temporally coupled to a set of Runge-Kutta (RK) methods of orders up to six. The code integrates the linear advection equations as well as the isothermal Euler equations in one, two, and three dimensions. In order to target modern hardware involving many-core Central Processing Units (CPUs) and accelerators such as Graphic Processing Units (GPUs) we use the Kokkos library in conjunction with the Message Passing Interface (MPI) to run our single source code on various NVidia and AMD GPU systems.</div><div>By means of one- and two-dimensional simulations of simple test equations we find that the higher the order the faster is the code. Eighth-order simulations attain a given global error with much less computing time than third- or fourth-order simulations. The RK scheme has a smaller impact on the code performance and a classical fourth-order scheme seems to generally be a good choice.</div><div>The code performs very well on all considered HPC GPUs. We observe very good scaling properties up to 64 AMD MI250x GPUs and we show that the scaling properties are the same in two and three dimensions. The many-CPU performance is also very good and perfect weak scaling is observed up to many hundreds of CPU cores using MPI. We note that small grid-size simulations are faster on CPUs than on GPUs while GPUs win significantly over CPUs for simulations involving more than 10<sup>7</sup> degrees of freedom (<span><math><mo>≈</mo><msup><mrow><mn>3100</mn></mrow><mrow><mn>2</mn></mrow></msup></math></span> grid points). When it comes to the environmental impact of numerical simulations we estimate that GPUs consume less energy than CPUs for large grid-size simulations but more energy on small grids. Further, we observe a tendency that the more modern is the GPU the larger needs to be the grid in order to use it efficiently. This yields a rebound effect because larger simulations need longer computing times and in turn more energy that is not compensated by the energy efficiency gain of the newer GPUs.</div></div>\",\"PeriodicalId\":285,\"journal\":{\"name\":\"Computer Physics Communications\",\"volume\":\"318 \",\"pages\":\"Article 109873\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Physics Communications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010465525003753\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525003753","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Performance of a high-order MPI-Kokkos accelerated fluid solver
This work discusses the performance of a modern numerical scheme for fluid dynamical problems on modern high-performance computing (HPC) architectures. Our code implements a spatial nodal discontinuous Galerkin (NDG) scheme that we test up to an order of convergence of eight. It is temporally coupled to a set of Runge-Kutta (RK) methods of orders up to six. The code integrates the linear advection equations as well as the isothermal Euler equations in one, two, and three dimensions. In order to target modern hardware involving many-core Central Processing Units (CPUs) and accelerators such as Graphic Processing Units (GPUs) we use the Kokkos library in conjunction with the Message Passing Interface (MPI) to run our single source code on various NVidia and AMD GPU systems.
By means of one- and two-dimensional simulations of simple test equations we find that the higher the order the faster is the code. Eighth-order simulations attain a given global error with much less computing time than third- or fourth-order simulations. The RK scheme has a smaller impact on the code performance and a classical fourth-order scheme seems to generally be a good choice.
The code performs very well on all considered HPC GPUs. We observe very good scaling properties up to 64 AMD MI250x GPUs and we show that the scaling properties are the same in two and three dimensions. The many-CPU performance is also very good and perfect weak scaling is observed up to many hundreds of CPU cores using MPI. We note that small grid-size simulations are faster on CPUs than on GPUs while GPUs win significantly over CPUs for simulations involving more than 107 degrees of freedom ( grid points). When it comes to the environmental impact of numerical simulations we estimate that GPUs consume less energy than CPUs for large grid-size simulations but more energy on small grids. Further, we observe a tendency that the more modern is the GPU the larger needs to be the grid in order to use it efficiently. This yields a rebound effect because larger simulations need longer computing times and in turn more energy that is not compensated by the energy efficiency gain of the newer GPUs.
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.