{"title":"粒子单元的混合编程和多gpu实现","authors":"E. Krishnasamy, I. Vasileska, L. Kos, P. Bouvry","doi":"10.1109/ICCCS57501.2023.10150523","DOIUrl":null,"url":null,"abstract":"Numerical modelling in fusion physics is crucial for studying fusion devices, space, and astrophysical systems. The plasma simulations of fusion devices demand a kinetic approach to handle extreme nonlinearities methods. One of the most used plasma kinetic simulation codes is the Particle-In-Cell (PIC). The HPC systems worldwide are getting more powerful with the combination of CPU, GPU, and other accelerators (e.g., FPGAs and Quantum Processors). Moreover, we can already notice that several exascale machines are operational worldwide; one typical example is the Frontier (Oak Ridge National Laboratory) exascale machine. In parallel, the same effort is being made for scientific algorithms to use robust HPC systems efficiently. Many programming frameworks (e.g., OpenACC, OpenMP offloading, and SYCL) mainly offer excellent support portability to the existing scientific codes to use the exascale HPC systems. This work demonstrates hybrid and multiple GPUs capabilities (or portability) for Simple Particle-In-Cell (SIMPIC) based on the PIC algorithm. First, we have implemented the hybrid (MPI+OpenMP) portability and multiple GPUs (multiple node GPU with the help of MPI) offloading portability. The first implementation gains a speed up to 40% compared to the plain MPI version, and the second implementation achieves up to 40% speedups compared to the hybrid (MPI+OpenMP) implementation.","PeriodicalId":266168,"journal":{"name":"2023 8th International Conference on Computer and Communication Systems (ICCCS)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid programming and multiple GPUs implementation for Particle-In-Cell\",\"authors\":\"E. Krishnasamy, I. Vasileska, L. Kos, P. Bouvry\",\"doi\":\"10.1109/ICCCS57501.2023.10150523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Numerical modelling in fusion physics is crucial for studying fusion devices, space, and astrophysical systems. The plasma simulations of fusion devices demand a kinetic approach to handle extreme nonlinearities methods. One of the most used plasma kinetic simulation codes is the Particle-In-Cell (PIC). The HPC systems worldwide are getting more powerful with the combination of CPU, GPU, and other accelerators (e.g., FPGAs and Quantum Processors). Moreover, we can already notice that several exascale machines are operational worldwide; one typical example is the Frontier (Oak Ridge National Laboratory) exascale machine. In parallel, the same effort is being made for scientific algorithms to use robust HPC systems efficiently. Many programming frameworks (e.g., OpenACC, OpenMP offloading, and SYCL) mainly offer excellent support portability to the existing scientific codes to use the exascale HPC systems. This work demonstrates hybrid and multiple GPUs capabilities (or portability) for Simple Particle-In-Cell (SIMPIC) based on the PIC algorithm. First, we have implemented the hybrid (MPI+OpenMP) portability and multiple GPUs (multiple node GPU with the help of MPI) offloading portability. The first implementation gains a speed up to 40% compared to the plain MPI version, and the second implementation achieves up to 40% speedups compared to the hybrid (MPI+OpenMP) implementation.\",\"PeriodicalId\":266168,\"journal\":{\"name\":\"2023 8th International Conference on Computer and Communication Systems (ICCCS)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 8th International Conference on Computer and Communication Systems (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCS57501.2023.10150523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 8th International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS57501.2023.10150523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hybrid programming and multiple GPUs implementation for Particle-In-Cell
Numerical modelling in fusion physics is crucial for studying fusion devices, space, and astrophysical systems. The plasma simulations of fusion devices demand a kinetic approach to handle extreme nonlinearities methods. One of the most used plasma kinetic simulation codes is the Particle-In-Cell (PIC). The HPC systems worldwide are getting more powerful with the combination of CPU, GPU, and other accelerators (e.g., FPGAs and Quantum Processors). Moreover, we can already notice that several exascale machines are operational worldwide; one typical example is the Frontier (Oak Ridge National Laboratory) exascale machine. In parallel, the same effort is being made for scientific algorithms to use robust HPC systems efficiently. Many programming frameworks (e.g., OpenACC, OpenMP offloading, and SYCL) mainly offer excellent support portability to the existing scientific codes to use the exascale HPC systems. This work demonstrates hybrid and multiple GPUs capabilities (or portability) for Simple Particle-In-Cell (SIMPIC) based on the PIC algorithm. First, we have implemented the hybrid (MPI+OpenMP) portability and multiple GPUs (multiple node GPU with the help of MPI) offloading portability. The first implementation gains a speed up to 40% compared to the plain MPI version, and the second implementation achieves up to 40% speedups compared to the hybrid (MPI+OpenMP) implementation.