APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation

Proceedings of Workshop on General Purpose Processing Using GPUs Pub Date : 2014-03-01 DOI:10.1145/2588768.2576789

Yulong Yu, Xubin He, He Guo, Sihui Zhong, Yuxin Wang, Xin Chen, Weijun Xiao

{"title":"APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation","authors":"Yulong Yu, Xubin He, He Guo, Sihui Zhong, Yuxin Wang, Xin Chen, Weijun Xiao","doi":"10.1145/2588768.2576789","DOIUrl":null,"url":null,"abstract":"General-purpose graphics processing units (GPGPU) brings an opportunity to improve the performance for many applications. However, exploiting parallelism is low productive in current programming frameworks such as CUDA and OpenCL. Programmers have to consider and deal with many GPGPU architecture details; therefore it is a challenge to trade off the programmability and the efficiency of performance tuning. Parallel Repacking (PR) is a popular performance tuning approach for GPGPU applications, which improves the performance by changing the parallel granularity. Existing code transformation algorithms using PR increase the productivity, but they do not cover adequate code patterns and do not give an effective code error detection. In this paper, we propose a novel parallel repacking algorithm (APR) to cover a wide range of code patterns and improve efficiency. We develop an efficient code model that expresses a GPGPU program as a recursive statement sequence, and introduces a concept of singular statement. APR building upon this model uses appropriate transformation rules for singular and non-singular statements to generate the repacked codes. A recursive transformation is performed when it encounters a branching/loop singular statement. Additionally, singular statements unify the transformation for barriers and data sharing, and enable APR to detect the barrier errors. The experiment results based on a prototype show that out proposed APR covers more code patterns than existing solutions such as the automatic thread coarsening in Crest, and the repacked codes using the APR achieve effective performance gain up to 3.28X speedup, in some cases even higher than manually tuned repacked codes.","PeriodicalId":394600,"journal":{"name":"Proceedings of Workshop on General Purpose Processing Using GPUs","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Workshop on General Purpose Processing Using GPUs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2588768.2576789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

General-purpose graphics processing units (GPGPU) brings an opportunity to improve the performance for many applications. However, exploiting parallelism is low productive in current programming frameworks such as CUDA and OpenCL. Programmers have to consider and deal with many GPGPU architecture details; therefore it is a challenge to trade off the programmability and the efficiency of performance tuning. Parallel Repacking (PR) is a popular performance tuning approach for GPGPU applications, which improves the performance by changing the parallel granularity. Existing code transformation algorithms using PR increase the productivity, but they do not cover adequate code patterns and do not give an effective code error detection. In this paper, we propose a novel parallel repacking algorithm (APR) to cover a wide range of code patterns and improve efficiency. We develop an efficient code model that expresses a GPGPU program as a recursive statement sequence, and introduces a concept of singular statement. APR building upon this model uses appropriate transformation rules for singular and non-singular statements to generate the repacked codes. A recursive transformation is performed when it encounters a branching/loop singular statement. Additionally, singular statements unify the transformation for barriers and data sharing, and enable APR to detect the barrier errors. The experiment results based on a prototype show that out proposed APR covers more code patterns than existing solutions such as the automatic thread coarsening in Crest, and the repacked codes using the APR achieve effective performance gain up to 3.28X speedup, in some cases even higher than manually tuned repacked codes.

查看原文本刊更多论文

APR:一种高效GPGPU并行代码转换的新型并行重包装算法

通用图形处理单元(GPGPU)为许多应用程序带来了提高性能的机会。然而，在当前的编程框架(如CUDA和OpenCL)中，利用并行性的效率很低。程序员必须考虑和处理许多GPGPU架构细节;因此，权衡可编程性和性能调优的效率是一个挑战。并行重新打包(Parallel Repacking, PR)是GPGPU应用程序的一种流行的性能调优方法，它通过改变并行粒度来提高性能。使用PR的现有代码转换算法提高了生产率，但是它们没有覆盖足够的代码模式，也没有提供有效的代码错误检测。在本文中，我们提出了一种新的并行重包装算法(APR)，以覆盖更广泛的代码模式并提高效率。我们开发了一种高效的代码模型，将GPGPU程序表示为递归语句序列，并引入了奇异语句的概念。建立在此模型之上的APR使用奇异和非奇异语句的适当转换规则来生成重新打包的代码。当遇到分支/循环奇异语句时，执行递归转换。此外，奇异语句统一了屏障和数据共享的转换，并使APR能够检测屏障错误。基于原型的实验结果表明，我们提出的APR覆盖了比现有解决方案(如Crest中的自动线程粗化)更多的代码模式，并且使用APR重新打包的代码获得了高达3.28倍的有效性能增益，在某些情况下甚至高于手动调整的重新打包代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of Workshop on General Purpose Processing Using GPUs

自引率

0.00%

发文量