性能可移植的Vlasov代码，带有c++并行算法

2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) Pub Date : 2022-11-01 DOI:10.1109/P3HPC56579.2022.00012

Y. Asahi, T. Padioleau, G. Latu, Julien Bigot, V. Grandgirard, K. Obrejan

{"title":"性能可移植的Vlasov代码，带有c++并行算法","authors":"Y. Asahi, T. Padioleau, G. Latu, Julien Bigot, V. Grandgirard, K. Obrejan","doi":"10.1109/P3HPC56579.2022.00012","DOIUrl":null,"url":null,"abstract":"This paper presents the performance portable implementation of a kinetic plasma simulation code with C++ parallel algorithm to run across multiple CPUs and GPUs. Relying on the language standard parallelism stdpar and proposed language standard multi-dimensional array support mdspan, we demonstrate that a performance portable implementation is possible without harming the readability and productivity. We obtain a good overall performance for a mini-application in the range of 20 % to the Kokkos version on Intel Icelake, NVIDIA V100, and A100 GPUs. Our conclusion is that stdpar can be a good candidate to develop a performance portable and productive code targeting the Exascale era platform, assuming this approach will be available on AMD and/or Intel GPUs in the future.","PeriodicalId":261766,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance portable Vlasov code with C++ parallel algorithm\",\"authors\":\"Y. Asahi, T. Padioleau, G. Latu, Julien Bigot, V. Grandgirard, K. Obrejan\",\"doi\":\"10.1109/P3HPC56579.2022.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the performance portable implementation of a kinetic plasma simulation code with C++ parallel algorithm to run across multiple CPUs and GPUs. Relying on the language standard parallelism stdpar and proposed language standard multi-dimensional array support mdspan, we demonstrate that a performance portable implementation is possible without harming the readability and productivity. We obtain a good overall performance for a mini-application in the range of 20 % to the Kokkos version on Intel Icelake, NVIDIA V100, and A100 GPUs. Our conclusion is that stdpar can be a good candidate to develop a performance portable and productive code targeting the Exascale era platform, assuming this approach will be available on AMD and/or Intel GPUs in the future.\",\"PeriodicalId\":261766,\"journal\":{\"name\":\"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/P3HPC56579.2022.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/P3HPC56579.2022.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种基于c++并行算法的动态等离子体仿真代码的性能可移植性实现，可在多个cpu和gpu上运行。依靠语言标准并行性标准和建议的语言标准多维数组支持mdspan，我们证明了在不损害可读性和生产力的情况下实现性能可移植是可能的。我们在英特尔冰岛，NVIDIA V100和A100 gpu上获得了Kokkos版本20%的小型应用程序的良好整体性能。我们的结论是，stdpar可能是开发针对Exascale时代平台的性能可移植和高效代码的一个很好的候选，假设这种方法将来可以在AMD和/或Intel gpu上使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance portable Vlasov code with C++ parallel algorithm

This paper presents the performance portable implementation of a kinetic plasma simulation code with C++ parallel algorithm to run across multiple CPUs and GPUs. Relying on the language standard parallelism stdpar and proposed language standard multi-dimensional array support mdspan, we demonstrate that a performance portable implementation is possible without harming the readability and productivity. We obtain a good overall performance for a mini-application in the range of 20 % to the Kokkos version on Intel Icelake, NVIDIA V100, and A100 GPUs. Our conclusion is that stdpar can be a good candidate to develop a performance portable and productive code targeting the Exascale era platform, assuming this approach will be available on AMD and/or Intel GPUs in the future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)

自引率

0.00%

发文量