Shuhua Zeng , Shaobo Yao , Junyuan Yang , Wenwen Zhao , Jiaqi An , Weifang Chen
{"title":"高超声速稀薄非平衡流中非线性耦合本构关系的异构CPU-GPU并行化","authors":"Shuhua Zeng , Shaobo Yao , Junyuan Yang , Wenwen Zhao , Jiaqi An , Weifang Chen","doi":"10.1016/j.cpc.2025.109905","DOIUrl":null,"url":null,"abstract":"<div><div>The nonlinear coupled constitutive relations (NCCR) have been proven to be a promising tool for rarefied non-equilibrium flows. To further optimize the efficiency of the NCCR solution in hypersonic complex flows, the first migration of the NCCR method to a graphics processing unit (GPU) platform is conducted in this study, with the application of compute unified device architecture (CUDA) and message passing interface (MPI) models. Concurrently, the data parallel lower upper relaxation (DPLUR) implicit scheme is applied to avoid data dependence during the computational processes. After a code validation, three numerical cases, i.e., hypersonic flows around a blunt cylinder with varying mesh sizes, a circular cylinder with different Knudsen numbers, and a hypersonic technology vehicle (HTV)-type flying vehicle with various Mach numbers, were investigated for assessing the performance improvement of GPU-NCCR solver on a CPU-GPU heterogeneous parallel computing platform. The results in this work show that the proposed GPU-accelerated NCCR solver on a single NVIDIA GeForce RTX 4090 GPU can achieve one or two orders of magnitude speedups, ranging from 54.5 to 179.3, in comparison to the CPU-only NCCR solution on a single AMD EPYC 7T83 CPU core. Within the computing power capabilities, the GPU-NCCR algorithm's performance gain is greater with a larger mesh size, and slightly affected by the incoming flow conditions. More importantly, the GPU-accelerated NCCR solution attains more speedups than the GPU-NS solver in hypersonic non-equilibrium flows. These superior advantages of the proposed GPU-accelerated computational strategy are expected to render the NCCR method a fairly efficient engineering approach for modelling rarefied non-equilibrium flows around hypersonic vehicles.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"318 ","pages":"Article 109905"},"PeriodicalIF":3.4000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heterogeneous CPU-GPU parallelization for nonlinear coupled constitutive relations in hypersonic rarefied non-equilibrium flows\",\"authors\":\"Shuhua Zeng , Shaobo Yao , Junyuan Yang , Wenwen Zhao , Jiaqi An , Weifang Chen\",\"doi\":\"10.1016/j.cpc.2025.109905\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The nonlinear coupled constitutive relations (NCCR) have been proven to be a promising tool for rarefied non-equilibrium flows. To further optimize the efficiency of the NCCR solution in hypersonic complex flows, the first migration of the NCCR method to a graphics processing unit (GPU) platform is conducted in this study, with the application of compute unified device architecture (CUDA) and message passing interface (MPI) models. Concurrently, the data parallel lower upper relaxation (DPLUR) implicit scheme is applied to avoid data dependence during the computational processes. After a code validation, three numerical cases, i.e., hypersonic flows around a blunt cylinder with varying mesh sizes, a circular cylinder with different Knudsen numbers, and a hypersonic technology vehicle (HTV)-type flying vehicle with various Mach numbers, were investigated for assessing the performance improvement of GPU-NCCR solver on a CPU-GPU heterogeneous parallel computing platform. The results in this work show that the proposed GPU-accelerated NCCR solver on a single NVIDIA GeForce RTX 4090 GPU can achieve one or two orders of magnitude speedups, ranging from 54.5 to 179.3, in comparison to the CPU-only NCCR solution on a single AMD EPYC 7T83 CPU core. Within the computing power capabilities, the GPU-NCCR algorithm's performance gain is greater with a larger mesh size, and slightly affected by the incoming flow conditions. More importantly, the GPU-accelerated NCCR solution attains more speedups than the GPU-NS solver in hypersonic non-equilibrium flows. These superior advantages of the proposed GPU-accelerated computational strategy are expected to render the NCCR method a fairly efficient engineering approach for modelling rarefied non-equilibrium flows around hypersonic vehicles.</div></div>\",\"PeriodicalId\":285,\"journal\":{\"name\":\"Computer Physics Communications\",\"volume\":\"318 \",\"pages\":\"Article 109905\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Physics Communications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010465525004060\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465525004060","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Heterogeneous CPU-GPU parallelization for nonlinear coupled constitutive relations in hypersonic rarefied non-equilibrium flows
The nonlinear coupled constitutive relations (NCCR) have been proven to be a promising tool for rarefied non-equilibrium flows. To further optimize the efficiency of the NCCR solution in hypersonic complex flows, the first migration of the NCCR method to a graphics processing unit (GPU) platform is conducted in this study, with the application of compute unified device architecture (CUDA) and message passing interface (MPI) models. Concurrently, the data parallel lower upper relaxation (DPLUR) implicit scheme is applied to avoid data dependence during the computational processes. After a code validation, three numerical cases, i.e., hypersonic flows around a blunt cylinder with varying mesh sizes, a circular cylinder with different Knudsen numbers, and a hypersonic technology vehicle (HTV)-type flying vehicle with various Mach numbers, were investigated for assessing the performance improvement of GPU-NCCR solver on a CPU-GPU heterogeneous parallel computing platform. The results in this work show that the proposed GPU-accelerated NCCR solver on a single NVIDIA GeForce RTX 4090 GPU can achieve one or two orders of magnitude speedups, ranging from 54.5 to 179.3, in comparison to the CPU-only NCCR solution on a single AMD EPYC 7T83 CPU core. Within the computing power capabilities, the GPU-NCCR algorithm's performance gain is greater with a larger mesh size, and slightly affected by the incoming flow conditions. More importantly, the GPU-accelerated NCCR solution attains more speedups than the GPU-NS solver in hypersonic non-equilibrium flows. These superior advantages of the proposed GPU-accelerated computational strategy are expected to render the NCCR method a fairly efficient engineering approach for modelling rarefied non-equilibrium flows around hypersonic vehicles.
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.