利用HPC加速器的特定设备机制的比较研究

Proceedings of the 8th Workshop on General Purpose Processing using GPUs Pub Date : 2015-02-07 DOI:10.1145/2716282.2716293

Ayman Tarakji, Lukas Börger, R. Leupers

{"title":"利用HPC加速器的特定设备机制的比较研究","authors":"Ayman Tarakji, Lukas Börger, R. Leupers","doi":"10.1145/2716282.2716293","DOIUrl":null,"url":null,"abstract":"A variety of computational accelerators have been greatly improved in recent years. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphics Core Next, all represent real innovations in the field of HPC. Based on the single unified programing interface OpenCL, this paper reports a careful study of a well thought-out selection of such devices. A micro-benchmark suite is designed and implemented to investigate the capability of each accelerator to exploit parallelism in OpenCL. Our results expose the relationship between several programing aspects and their possible impact on performance. Instruction level parallelism, intra-kernel vector parallelism, multiple-issue, work-group size, instruction scheduling and a variety of other aspects are explored, highlighting their interaction that must be carefully considered when developing applications for heterogeneous architectures. Evidence-based findings related to microarchitectural features as well as performance characteristics are cross-checked with reference to the compiled code being executed. In conclusion, a case study involving a real application is presented as a part of the verification process of statements.","PeriodicalId":432610,"journal":{"name":"Proceedings of the 8th Workshop on General Purpose Processing using GPUs","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comparative investigation of device-specific mechanisms for exploiting HPC accelerators\",\"authors\":\"Ayman Tarakji, Lukas Börger, R. Leupers\",\"doi\":\"10.1145/2716282.2716293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A variety of computational accelerators have been greatly improved in recent years. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphics Core Next, all represent real innovations in the field of HPC. Based on the single unified programing interface OpenCL, this paper reports a careful study of a well thought-out selection of such devices. A micro-benchmark suite is designed and implemented to investigate the capability of each accelerator to exploit parallelism in OpenCL. Our results expose the relationship between several programing aspects and their possible impact on performance. Instruction level parallelism, intra-kernel vector parallelism, multiple-issue, work-group size, instruction scheduling and a variety of other aspects are explored, highlighting their interaction that must be carefully considered when developing applications for heterogeneous architectures. Evidence-based findings related to microarchitectural features as well as performance characteristics are cross-checked with reference to the compiled code being executed. In conclusion, a case study involving a real application is presented as a part of the verification process of statements.\",\"PeriodicalId\":432610,\"journal\":{\"name\":\"Proceedings of the 8th Workshop on General Purpose Processing using GPUs\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th Workshop on General Purpose Processing using GPUs\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2716282.2716293\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th Workshop on General Purpose Processing using GPUs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2716282.2716293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，各种计算加速器都有了很大的改进。英特尔的MIC(多集成核心)和两种GPU架构，NVIDIA的Kepler和AMD的Graphics Core Next，都代表了高性能计算领域的真正创新。基于单一的统一编程接口OpenCL，本文对这种器件的选择进行了仔细的研究。设计并实现了一个微基准测试套件来研究每个加速器在OpenCL中利用并行性的能力。我们的结果揭示了几个编程方面之间的关系及其对性能的可能影响。指令级并行，内核内矢量并行，多问题，工作组大小，指令调度和各种其他方面进行了探索，突出了在开发异构体系结构应用程序时必须仔细考虑的相互作用。与微架构特性和性能特征相关的基于证据的发现将与正在执行的编译代码进行交叉检查。最后，作为报表验证过程的一部分，提出了涉及实际应用的案例研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comparative investigation of device-specific mechanisms for exploiting HPC accelerators

A variety of computational accelerators have been greatly improved in recent years. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphics Core Next, all represent real innovations in the field of HPC. Based on the single unified programing interface OpenCL, this paper reports a careful study of a well thought-out selection of such devices. A micro-benchmark suite is designed and implemented to investigate the capability of each accelerator to exploit parallelism in OpenCL. Our results expose the relationship between several programing aspects and their possible impact on performance. Instruction level parallelism, intra-kernel vector parallelism, multiple-issue, work-group size, instruction scheduling and a variety of other aspects are explored, highlighting their interaction that must be carefully considered when developing applications for heterogeneous architectures. Evidence-based findings related to microarchitectural features as well as performance characteristics are cross-checked with reference to the compiled code being executed. In conclusion, a case study involving a real application is presented as a part of the verification process of statements.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 8th Workshop on General Purpose Processing using GPUs

自引率

0.00%

发文量