Shubhashree Subudhi , Amol Khillare , N. Munikrishna , N. Balakrishnan
{"title":"GPU 加速交错更新程序 (SUP)","authors":"Shubhashree Subudhi , Amol Khillare , N. Munikrishna , N. Balakrishnan","doi":"10.1016/j.compfluid.2024.106408","DOIUrl":null,"url":null,"abstract":"<div><p>The advancement in programmable capability of graphics hardware has paved new opportunities in the domain of high performance computing (HPC). The computational fluid dynamics (CFD) community, being a significant user of HPC, has started exploiting the inherent data parallelism in the numerical solvers to be able to make efficient use of these many-core, high throughput accelerator based processors. In the present work, we examine the process of accelerating our CPU based Staggered Update Procedure (SUP) solver, i.e., a higher order accurate cell-centred finite volume solver by off-loading the computationally most expensive region of the code pertaining to the explicit residual computation. We have adopted OpenACC, a directive based programming model to expose parallelism in the code. The framework evolved for GPU porting in the context of SUP is also of value to those intending to port their CFD solvers based on classical finite volume methodology. The performance analysis is conducted using scalar convection–diffusion equations in both two- and three-dimensions. The findings demonstrate a speedup factor of 9 (in case of 2D) and 28 (in case of 3D) when considering the explicit residual alone, achieved with a single NVIDIA Tesla V100 GPU card. In addition, we could establish superior algorithmic scalability by the way of recovering near perfect serial performance, on the heterogeneous CPU+GPU architecture. Further, overall code acceleration can be achieved by porting other parts of the solver on GPU.</p></div>","PeriodicalId":287,"journal":{"name":"Computers & Fluids","volume":"283 ","pages":"Article 106408"},"PeriodicalIF":2.5000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GPU accelerated Staggered Update Procedure (SUP)\",\"authors\":\"Shubhashree Subudhi , Amol Khillare , N. Munikrishna , N. Balakrishnan\",\"doi\":\"10.1016/j.compfluid.2024.106408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The advancement in programmable capability of graphics hardware has paved new opportunities in the domain of high performance computing (HPC). The computational fluid dynamics (CFD) community, being a significant user of HPC, has started exploiting the inherent data parallelism in the numerical solvers to be able to make efficient use of these many-core, high throughput accelerator based processors. In the present work, we examine the process of accelerating our CPU based Staggered Update Procedure (SUP) solver, i.e., a higher order accurate cell-centred finite volume solver by off-loading the computationally most expensive region of the code pertaining to the explicit residual computation. We have adopted OpenACC, a directive based programming model to expose parallelism in the code. The framework evolved for GPU porting in the context of SUP is also of value to those intending to port their CFD solvers based on classical finite volume methodology. The performance analysis is conducted using scalar convection–diffusion equations in both two- and three-dimensions. The findings demonstrate a speedup factor of 9 (in case of 2D) and 28 (in case of 3D) when considering the explicit residual alone, achieved with a single NVIDIA Tesla V100 GPU card. In addition, we could establish superior algorithmic scalability by the way of recovering near perfect serial performance, on the heterogeneous CPU+GPU architecture. Further, overall code acceleration can be achieved by porting other parts of the solver on GPU.</p></div>\",\"PeriodicalId\":287,\"journal\":{\"name\":\"Computers & Fluids\",\"volume\":\"283 \",\"pages\":\"Article 106408\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Fluids\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045793024002391\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Fluids","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045793024002391","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
The advancement in programmable capability of graphics hardware has paved new opportunities in the domain of high performance computing (HPC). The computational fluid dynamics (CFD) community, being a significant user of HPC, has started exploiting the inherent data parallelism in the numerical solvers to be able to make efficient use of these many-core, high throughput accelerator based processors. In the present work, we examine the process of accelerating our CPU based Staggered Update Procedure (SUP) solver, i.e., a higher order accurate cell-centred finite volume solver by off-loading the computationally most expensive region of the code pertaining to the explicit residual computation. We have adopted OpenACC, a directive based programming model to expose parallelism in the code. The framework evolved for GPU porting in the context of SUP is also of value to those intending to port their CFD solvers based on classical finite volume methodology. The performance analysis is conducted using scalar convection–diffusion equations in both two- and three-dimensions. The findings demonstrate a speedup factor of 9 (in case of 2D) and 28 (in case of 3D) when considering the explicit residual alone, achieved with a single NVIDIA Tesla V100 GPU card. In addition, we could establish superior algorithmic scalability by the way of recovering near perfect serial performance, on the heterogeneous CPU+GPU architecture. Further, overall code acceleration can be achieved by porting other parts of the solver on GPU.
期刊介绍:
Computers & Fluids is multidisciplinary. The term ''fluid'' is interpreted in the broadest sense. Hydro- and aerodynamics, high-speed and physical gas dynamics, turbulence and flow stability, multiphase flow, rheology, tribology and fluid-structure interaction are all of interest, provided that computer technique plays a significant role in the associated studies or design methodology.