Divergence Analysis with Affine Constraints

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI:10.1109/SBAC-PAD.2012.22

Diogo Sampaio, R. M. Souza, Caroline Collange, Fernando Magno Quintão Pereira

{"title":"Divergence Analysis with Affine Constraints","authors":"Diogo Sampaio, R. M. Souza, Caroline Collange, Fernando Magno Quintão Pereira","doi":"10.1109/SBAC-PAD.2012.22","DOIUrl":null,"url":null,"abstract":"The rising popularity of graphics processing units is bringing renewed interest in code optimization techniques for SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind of divergence analysis, that is able to represent variables as affine functions of thread identifiers. We have implemented this analysis in Ocelot, an open source compiler, and use it to analyze a suite of 177 CUDA kernels from well-known benchmarks. We can mark about one fourth of all program variables as affine functions of thread identifiers. In addition to the novel divergence analysis, we also introduce the notion of a divergence aware register allocator. This allocator uses information from our analysis to either rematerialize affine variables, or to move uniform variables to shared memory. As a testimony of its effectiveness, our divergence aware allocator produces GPU code that is 29.70% faster than the code produced by Ocelot's register allocator. Divergence analysis with affine constraints is publicly available in the Ocelot compiler since June/2012.","PeriodicalId":232444,"journal":{"name":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2012.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

The rising popularity of graphics processing units is bringing renewed interest in code optimization techniques for SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind of divergence analysis, that is able to represent variables as affine functions of thread identifiers. We have implemented this analysis in Ocelot, an open source compiler, and use it to analyze a suite of 177 CUDA kernels from well-known benchmarks. We can mark about one fourth of all program variables as affine functions of thread identifiers. In addition to the novel divergence analysis, we also introduce the notion of a divergence aware register allocator. This allocator uses information from our analysis to either rematerialize affine variables, or to move uniform variables to shared memory. As a testimony of its effectiveness, our divergence aware allocator produces GPU code that is 29.70% faster than the code produced by Ocelot's register allocator. Divergence analysis with affine constraints is publicly available in the Ocelot compiler since June/2012.

查看原文本刊更多论文

仿射约束下的散度分析

图形处理单元的日益流行重新引起了对SIMD处理器代码优化技术的兴趣。这些优化中的许多都依赖于散度分析，如果变量在每个线程上具有相同的值，则将其分类为均匀的，如果变量在每个线程上具有相同的值，则将其分类为发散的。本文介绍了一种新的散度分析方法，将变量表示为线程标识符的仿射函数。我们已经在Ocelot(一个开源编译器)中实现了这种分析，并使用它来分析来自知名基准测试的177个CUDA内核。我们可以将大约四分之一的程序变量标记为线程标识符的仿射函数。除了新的发散分析之外，我们还引入了发散感知寄存器分配器的概念。这个分配器使用我们分析的信息来重新实现仿射变量，或者将统一变量移动到共享内存中。作为其有效性的证明，我们的发散感知分配器生成的GPU代码比Ocelot的寄存器分配器生成的代码快29.70%。带有仿射约束的发散分析自2012年6月起在Ocelot编译器中公开可用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing

自引率

0.00%

发文量