将数值积分代码从CUDA移植到oneAPI:一个案例研究

ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy) Pub Date : 2023-02-11 DOI:10.48550/arXiv.2302.05730

Ioannis Sakiotis, K. Arumugam, M. Paterno, D. Ranjan, B. Terzić, M. Zubair

{"title":"将数值积分代码从CUDA移植到oneAPI:一个案例研究","authors":"Ioannis Sakiotis, K. Arumugam, M. Paterno, D. Ranjan, B. Terzić, M. Zubair","doi":"10.48550/arXiv.2302.05730","DOIUrl":null,"url":null,"abstract":"We present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the CUDA implementations of PAGANI and $m$-Cubes. We faced several challenges that caused performance degradation in the oneAPI ports. These include differences in utilized registers per thread, compiler optimizations, and mappings of CUDA library calls to oneAPI equivalents. After addressing those challenges, we tested both the PAGANI and m-Cubes integrators on numerous integrands of various characteristics. To evaluate the quality of the ports, we collected performance metrics of the CUDA and oneAPI implementations on the Nvidia V100 GPU. We found that the oneAPI ports often achieve comparable performance to the CUDA versions, and that they are at most 10% slower.","PeriodicalId":92039,"journal":{"name":"ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)","volume":"23 1","pages":"339-358"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Porting numerical integration codes from CUDA to oneAPI: a case study\",\"authors\":\"Ioannis Sakiotis, K. Arumugam, M. Paterno, D. Ranjan, B. Terzić, M. Zubair\",\"doi\":\"10.48550/arXiv.2302.05730\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the CUDA implementations of PAGANI and $m$-Cubes. We faced several challenges that caused performance degradation in the oneAPI ports. These include differences in utilized registers per thread, compiler optimizations, and mappings of CUDA library calls to oneAPI equivalents. After addressing those challenges, we tested both the PAGANI and m-Cubes integrators on numerous integrands of various characteristics. To evaluate the quality of the ports, we collected performance metrics of the CUDA and oneAPI implementations on the Nvidia V100 GPU. We found that the oneAPI ports often achieve comparable performance to the CUDA versions, and that they are at most 10% slower.\",\"PeriodicalId\":92039,\"journal\":{\"name\":\"ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)\",\"volume\":\"23 1\",\"pages\":\"339-358\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2302.05730\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2302.05730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们介绍了将优化的CUDA实现移植到oneAPI的经验。我们专注于数值积分的用例，特别是PAGANI和$m$-Cubes的CUDA实现。我们面临着导致oneAPI端口性能下降的几个挑战。这些差异包括每个线程使用的寄存器的差异，编译器优化，以及CUDA库调用到一个等效api的映射。在解决了这些挑战之后，我们在许多具有不同特征的积分器上测试了PAGANI和m-Cubes积分器。为了评估端口的质量，我们收集了Nvidia V100 GPU上CUDA和oneAPI实现的性能指标。我们发现oneAPI端口通常可以达到与CUDA版本相当的性能，并且它们最多要慢10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Porting numerical integration codes from CUDA to oneAPI: a case study

We present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the CUDA implementations of PAGANI and $m$-Cubes. We faced several challenges that caused performance degradation in the oneAPI ports. These include differences in utilized registers per thread, compiler optimizations, and mappings of CUDA library calls to oneAPI equivalents. After addressing those challenges, we tested both the PAGANI and m-Cubes integrators on numerous integrands of various characteristics. To evaluate the quality of the ports, we collected performance metrics of the CUDA and oneAPI implementations on the Nvidia V100 GPU. We found that the oneAPI ports often achieve comparable performance to the CUDA versions, and that they are at most 10% slower.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICT systems security and privacy protection : 32nd IFIP TC 11 International Conference, SEC 2017, Rome, Italy, May 29-31, 2017, Proceedings. IFIP TC11 International Information Security Conference (32nd : 2017 : Rome, Italy)

自引率

0.00%

发文量