V. HarichandM., Bharatkumar Sharma, G. Sudhakaran, V. Ashok
{"title":"自适应笛卡尔网格CFD求解器在当前处理器体系结构中的加速","authors":"V. HarichandM., Bharatkumar Sharma, G. Sudhakaran, V. Ashok","doi":"10.1109/HiPC.2018.00025","DOIUrl":null,"url":null,"abstract":"In this paper, the challenges involved in the acceleration of an adaptive Cartesian Mesh CFD Solver PARAS-3D in the current generation processors(CPUs & GPUs) is explored. CFD codes are known for their memory bound nature, which remains as a significant bottle-neck in achieving higher performance. Adaptive Cartesian meshes with their oct-tree structure brings about more challenges in data parallelism. Moreover, Cartesian mesh solvers have higher memory band-width requirements due to their larger and varying stencil. The paper will detail how a re-design and implementation of a legacy Cartesian mesh CFD solver helped in achieving higher performance in CPUs by improvements in algorithms and data structures. Moreover, very good scalability to thousands of cores was achieved using asynchronous communication and weighted graph partitioning. A Structure of Array based data layout along with GPU features like Unified memory and Multi Process Service was used in the GPU acceleration process to obtain a performance of 4.4 X on top of the CPU only version by using nVidia Quadro GV100 GPUs.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Acceleration of an Adaptive Cartesian Mesh CFD Solver in the Current Generation Processor Architectures\",\"authors\":\"V. HarichandM., Bharatkumar Sharma, G. Sudhakaran, V. Ashok\",\"doi\":\"10.1109/HiPC.2018.00025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, the challenges involved in the acceleration of an adaptive Cartesian Mesh CFD Solver PARAS-3D in the current generation processors(CPUs & GPUs) is explored. CFD codes are known for their memory bound nature, which remains as a significant bottle-neck in achieving higher performance. Adaptive Cartesian meshes with their oct-tree structure brings about more challenges in data parallelism. Moreover, Cartesian mesh solvers have higher memory band-width requirements due to their larger and varying stencil. The paper will detail how a re-design and implementation of a legacy Cartesian mesh CFD solver helped in achieving higher performance in CPUs by improvements in algorithms and data structures. Moreover, very good scalability to thousands of cores was achieved using asynchronous communication and weighted graph partitioning. A Structure of Array based data layout along with GPU features like Unified memory and Multi Process Service was used in the GPU acceleration process to obtain a performance of 4.4 X on top of the CPU only version by using nVidia Quadro GV100 GPUs.\",\"PeriodicalId\":113335,\"journal\":{\"name\":\"2018 IEEE 25th International Conference on High Performance Computing (HiPC)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 25th International Conference on High Performance Computing (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC.2018.00025\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2018.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Acceleration of an Adaptive Cartesian Mesh CFD Solver in the Current Generation Processor Architectures
In this paper, the challenges involved in the acceleration of an adaptive Cartesian Mesh CFD Solver PARAS-3D in the current generation processors(CPUs & GPUs) is explored. CFD codes are known for their memory bound nature, which remains as a significant bottle-neck in achieving higher performance. Adaptive Cartesian meshes with their oct-tree structure brings about more challenges in data parallelism. Moreover, Cartesian mesh solvers have higher memory band-width requirements due to their larger and varying stencil. The paper will detail how a re-design and implementation of a legacy Cartesian mesh CFD solver helped in achieving higher performance in CPUs by improvements in algorithms and data structures. Moreover, very good scalability to thousands of cores was achieved using asynchronous communication and weighted graph partitioning. A Structure of Array based data layout along with GPU features like Unified memory and Multi Process Service was used in the GPU acceleration process to obtain a performance of 4.4 X on top of the CPU only version by using nVidia Quadro GV100 GPUs.