PLUTO+: near-complete modeling of affine transformations for parallelism and locality

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Pub Date : 2015-01-24 DOI:10.1145/2688500.2688512

Aravind Acharya, Uday Bondhugula

{"title":"PLUTO+: near-complete modeling of affine transformations for parallelism and locality","authors":"Aravind Acharya, Uday Bondhugula","doi":"10.1145/2688500.2688512","DOIUrl":null,"url":null,"abstract":"Affine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks like the Pluto algorithm, that include a cost function for modern multicore architectures where coarse-grained parallelism and locality are crucial, consider only a sub-space of transformations to avoid a combinatorial explosion in finding the transformations. The ensuing practical trade-offs lead to the exclusion of certain useful transformations, in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In this paper, we propose an approach to address this limitation by modeling a much larger space of affine transformations in conjunction with the Pluto algorithm's cost function. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance in any of the Polybench benchmarks. For Lattice Boltzmann Method (LBM) codes with periodic boundary conditions, it provides a mean speedup of 1.33x over Pluto. We also show that Pluto+ does not increase compile times significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time only by 15%. In cases where it improves execution time significantly, it increased polyhedral optimization time only by 2.04x.","PeriodicalId":291839,"journal":{"name":"Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2688500.2688512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Affine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks like the Pluto algorithm, that include a cost function for modern multicore architectures where coarse-grained parallelism and locality are crucial, consider only a sub-space of transformations to avoid a combinatorial explosion in finding the transformations. The ensuing practical trade-offs lead to the exclusion of certain useful transformations, in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In this paper, we propose an approach to address this limitation by modeling a much larger space of affine transformations in conjunction with the Pluto algorithm's cost function. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance in any of the Polybench benchmarks. For Lattice Boltzmann Method (LBM) codes with periodic boundary conditions, it provides a mean speedup of 1.33x over Pluto. We also show that Pluto+ does not increase compile times significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time only by 15%. In cases where it improves execution time significantly, it increased polyhedral optimization time only by 2.04x.

查看原文本刊更多论文

冥王星+:近乎完整的仿射变换的并行性和局部性建模

仿射变换已经被证明是非常强大的循环重组，因为它们能够模拟非常广泛的转换。一个单一的多维仿射函数可以表示一个长而复杂的简单变换序列。现有的仿射变换框架，如Pluto算法，包括一个用于现代多核架构的代价函数，其中粗粒度并行性和局部性至关重要，只考虑转换的子空间，以避免在寻找转换时出现组合爆炸。随后的实际权衡导致排除某些有用的转换，特别是涉及循环逆转和由负面因素造成的循环扭曲的转换组合。在本文中，我们提出了一种方法，通过与Pluto算法的成本函数结合建模更大的仿射变换空间来解决这一限制。我们对两者进行了实验评估，对编译时间的影响，以及生成代码的性能。评估表明，我们的新框架Pluto+在任何Polybench基准测试中都没有性能下降。对于具有周期边界条件的晶格玻尔兹曼方法(LBM)码，它提供了比冥王星平均1.33倍的加速。我们还展示了Pluto+不会显著增加编译时间。Polybench上的实验结果表明，Pluto+使总体多面体源到源优化时间仅提高了15%。在显著改善执行时间的情况下，多面体优化时间仅增加2.04倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

自引率

0.00%

发文量