Optimized two-level parallelization for GPU accelerators using the polyhedral model

Proceedings of the 26th International Conference on Compiler Construction Pub Date : 2017-02-05 DOI:10.1145/3033019.3033022

J. Shirako, Akihiro Hayashi, Vivek Sarkar

{"title":"Optimized two-level parallelization for GPU accelerators using the polyhedral model","authors":"J. Shirako, Akihiro Hayashi, Vivek Sarkar","doi":"10.1145/3033019.3033022","DOIUrl":null,"url":null,"abstract":"While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.","PeriodicalId":146080,"journal":{"name":"Proceedings of the 26th International Conference on Compiler Construction","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th International Conference on Compiler Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3033019.3033022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.

查看原文本刊更多论文

使用多面体模型优化GPU加速器的两级并行化

虽然GPU在当今的高性能计算机中扮演着越来越重要的角色，但优化GPU性能仍然给程序员带来了巨大的负担。优化gpu代码的主要挑战来自硬件并行性的两个层面，块和线程;每个级别都有显著不同的特征，需要不同的优化策略。本文提出了一种新的GPU并行编译优化算法。我们的方法基于多面体模型，与传统的基于ast的框架相比，它在程序分析和转换方面取得了重大进展。我们扩展了多面体调度，通过叠加的思想来实现两级并行，它集成了块级和线程级并行的单独调度。我们的实验结果表明，与最先进的gpgpu多面体并行代码生成器(PPCG)相比，我们提出的编译器优化框架可以在NVIDIA Tesla M2050和K80 gpu上提供1.8倍和2.1倍的几何平均改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 26th International Conference on Compiler Construction

自引率

0.00%

发文量