通过自动源到源转换的cpu加速应用程序的可移植性

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2019-01-14 DOI:10.1145/3293320.3293338

P. Sathre, M. Gardner, Wu-chun Feng

{"title":"通过自动源到源转换的cpu加速应用程序的可移植性","authors":"P. Sathre, M. Gardner, Wu-chun Feng","doi":"10.1145/3293320.3293338","DOIUrl":null,"url":null,"abstract":"Over the past decade, accelerator-based supercomputers have grown from 0% to 42% performance share on the TOP500. Ideally, GPU-accelerated code on such systems should be \"write once, run anywhere,\" regardless of the GPU device (or for that matter, any parallel device, e.g., CPU or FPGA). In practice, however, portability can be significantly more limited due to the sheer volume of code implemented in non-portable languages. For example, the tremendous success of CUDA, as evidenced by the vast cornucopia of CUDA-accelerated applications, makes it infeasible to manually rewrite all these applications to achieve portability. Consequently, we achieve portability by using our automated CUDA-to-OpenCL source-to-source translator called CU2CL. To demonstrate the state of the practice, we use CU2CL to automatically translate three medium-to-large, CUDA-optimized codes to OpenCL, thus enabling the codes to run on other GPU-accelerated systems (as well as CPU- or FPGA-based systems). These automatically translated codes deliver performance portability, including as much as three-fold performance improvement, on a GPU device not supported by CUDA.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation\",\"authors\":\"P. Sathre, M. Gardner, Wu-chun Feng\",\"doi\":\"10.1145/3293320.3293338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the past decade, accelerator-based supercomputers have grown from 0% to 42% performance share on the TOP500. Ideally, GPU-accelerated code on such systems should be \\\"write once, run anywhere,\\\" regardless of the GPU device (or for that matter, any parallel device, e.g., CPU or FPGA). In practice, however, portability can be significantly more limited due to the sheer volume of code implemented in non-portable languages. For example, the tremendous success of CUDA, as evidenced by the vast cornucopia of CUDA-accelerated applications, makes it infeasible to manually rewrite all these applications to achieve portability. Consequently, we achieve portability by using our automated CUDA-to-OpenCL source-to-source translator called CU2CL. To demonstrate the state of the practice, we use CU2CL to automatically translate three medium-to-large, CUDA-optimized codes to OpenCL, thus enabling the codes to run on other GPU-accelerated systems (as well as CPU- or FPGA-based systems). These automatically translated codes deliver performance portability, including as much as three-fold performance improvement, on a GPU device not supported by CUDA.\",\"PeriodicalId\":314778,\"journal\":{\"name\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3293320.3293338\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3293320.3293338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

在过去的十年中，基于加速器的超级计算机在TOP500中的性能份额从0%增长到42%。理想情况下，此类系统上的GPU加速代码应该是“一次写入，随处运行”，而不考虑GPU设备(或者就此而言，任何并行设备，例如CPU或FPGA)。然而，在实践中，由于用不可移植语言实现的大量代码，可移植性可能会受到更大的限制。例如，CUDA的巨大成功，正如大量CUDA加速应用程序所证明的那样，使得手动重写所有这些应用程序以实现可移植性变得不可行的。因此，我们通过使用称为CU2CL的自动化CUDA-to-OpenCL源代码到源代码转换器来实现可移植性。为了演示实践状态，我们使用CU2CL自动将三个中型到大型的cuda优化代码转换为OpenCL，从而使代码能够在其他gpu加速系统(以及基于CPU或fpga的系统)上运行。在CUDA不支持的GPU设备上，这些自动翻译的代码提供了性能可移植性，包括多达三倍的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Portability of CPU-Accelerated Applications via Automated Source-to-Source Translation

Over the past decade, accelerator-based supercomputers have grown from 0% to 42% performance share on the TOP500. Ideally, GPU-accelerated code on such systems should be "write once, run anywhere," regardless of the GPU device (or for that matter, any parallel device, e.g., CPU or FPGA). In practice, however, portability can be significantly more limited due to the sheer volume of code implemented in non-portable languages. For example, the tremendous success of CUDA, as evidenced by the vast cornucopia of CUDA-accelerated applications, makes it infeasible to manually rewrite all these applications to achieve portability. Consequently, we achieve portability by using our automated CUDA-to-OpenCL source-to-source translator called CU2CL. To demonstrate the state of the practice, we use CU2CL to automatically translate three medium-to-large, CUDA-optimized codes to OpenCL, thus enabling the codes to run on other GPU-accelerated systems (as well as CPU- or FPGA-based systems). These automatically translated codes deliver performance portability, including as much as three-fold performance improvement, on a GPU device not supported by CUDA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量