PHAST库-为gpu和多核启用单源和高性能代码

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-01 DOI:10.1109/HPCS.2017.109

Biagio Peccerillo, S. Bartolini

{"title":"PHAST库-为gpu和多核启用单源和高性能代码","authors":"Biagio Peccerillo, S. Bartolini","doi":"10.1109/HPCS.2017.109","DOIUrl":null,"url":null,"abstract":"The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores\",\"authors\":\"Biagio Peccerillo, S. Bartolini\",\"doi\":\"10.1109/HPCS.2017.109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.\",\"PeriodicalId\":115758,\"journal\":{\"name\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2017.109\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

多核和gpu等并行异构架构的仿真在编程语言/框架领域提出了新的挑战。模拟器的应用程序需要以一种可以很容易地适应特定架构的方式来表达，有效地针对每个架构进行调整，同时防止由于不均匀的手工优化而引入偏差。最常见的异构编程框架都太低级了，所以我们提出了PHAST，一个针对多核和Nvidia gpu的高级异构c++库。它允许在高抽象级别上编写代码，在允许精细的参数调优的同时达到良好的性能，并且不会屏蔽代码进行低级优化。我们在两种支持的体系结构上对DCT8x8的PHAST进行了评估。在多核上，我们发现PHAST的实现比OpenCL (AMD供应商)的实现快10倍左右，但比OpenCL (Intel供应商)的实现慢4倍左右，这有效地利用了自动向量化。在Nvidia gpu上，PHAST代码的性能比CUDA SDK参考版本高出55.14%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PHAST Library — Enabling Single-Source and High Performance Code for GPUs and Multi-cores

The simulation of parallel heterogeneous architectures such as multi-cores and GPUs sets new challenges in the programming language/framework domain. Applications for simulators need to be expressed in a way that can be easily adapted for the specific architectures, effectively tuned for on each of them while preventing from introducing biases due to non-uniform hand-made optimizations. The most common heterogeneous programming frameworks are too low-level, so we propose PHAST, a high-level heterogeneous C++ library targetable on multi-cores and Nvidia GPUs. It permits to write code at a high level of abstraction, to reach good performance while allowing for fine parameter tuning and not shielding code from low-level optimizations. We evaluate PHAST in the case of DCT8x8 on both supported architectures. On multi-cores, we found that PHAST implementation is around ten times faster than OpenCL (AMD vendor) implementation, but up to about 4x slower than OpenCL (Intel vendor) one, which effectively leverages auto-vectorization. On Nvidia GPUs, PHAST code performs up to 55.14% better than CUDA SDK reference version.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量