{"title":"Optimized two-level parallelization for GPU accelerators using the polyhedral model","authors":"J. Shirako, Akihiro Hayashi, Vivek Sarkar","doi":"10.1145/3033019.3033022","DOIUrl":null,"url":null,"abstract":"While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.","PeriodicalId":146080,"journal":{"name":"Proceedings of the 26th International Conference on Compiler Construction","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th International Conference on Compiler Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3033019.3033022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.
虽然GPU在当今的高性能计算机中扮演着越来越重要的角色,但优化GPU性能仍然给程序员带来了巨大的负担。优化gpu代码的主要挑战来自硬件并行性的两个层面,块和线程;每个级别都有显著不同的特征,需要不同的优化策略。本文提出了一种新的GPU并行编译优化算法。我们的方法基于多面体模型,与传统的基于ast的框架相比,它在程序分析和转换方面取得了重大进展。我们扩展了多面体调度,通过叠加的思想来实现两级并行,它集成了块级和线程级并行的单独调度。我们的实验结果表明,与最先进的gpgpu多面体并行代码生成器(PPCG)相比,我们提出的编译器优化框架可以在NVIDIA Tesla M2050和K80 gpu上提供1.8倍和2.1倍的几何平均改进。