A model for fusion and code motion in an automatic parallelizing compiler

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2010-09-11 DOI:10.1145/1854273.1854317

Uday Bondhugula, O. Günlük, S. Dash, Lakshminarayanan Renganarayanan

{"title":"A model for fusion and code motion in an automatic parallelizing compiler","authors":"Uday Bondhugula, O. Günlük, S. Dash, Lakshminarayanan Renganarayanan","doi":"10.1145/1854273.1854317","DOIUrl":null,"url":null,"abstract":"Loop fusion has been studied extensively, but in a manner isolated from other transformations. This was mainly due to the lack of a powerful intermediate representation for application of compositions of high-level transformations. Fusion presents strong interactions with parallelism and locality. Currently, there exist no models to determine good fusion structures integrated with all components of an auto-parallelizing compiler. This is also one of the reasons why all the benefits of optimization and automatic parallelization of long sequences of loop nests spanning hundreds of lines of code have never been explored. We present a fusion model in an integrated automatic parallelization framework that simultaneously optimizes for hardware prefetch stream buffer utilization, locality, and parallelism. Characterizing the legal space of fusion structures in the polyhedral compiler framework is not difficult. However, incorporating useful optimization criteria into such a legal space to pick good fusion structures is very hard. The model we propose captures utilization of hardware prefetch streams, loss of parallelism, as well as constraints imposed by privatization and code expansion into a single convex optimization space. The model scales very well to program sections spanning hundreds of lines of code. It has been implemented into the polyhedral pass of the IBM XL optimizing compiler. Experimental results demonstrate its effectiveness in finding good fusion structures for codes including SPEC benchmarks and large applications. An improvement ranging from 5% to nearly a factor of 2.75× is obtained over the current production compiler optimizer on these benchmarks.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 64

Abstract

Loop fusion has been studied extensively, but in a manner isolated from other transformations. This was mainly due to the lack of a powerful intermediate representation for application of compositions of high-level transformations. Fusion presents strong interactions with parallelism and locality. Currently, there exist no models to determine good fusion structures integrated with all components of an auto-parallelizing compiler. This is also one of the reasons why all the benefits of optimization and automatic parallelization of long sequences of loop nests spanning hundreds of lines of code have never been explored. We present a fusion model in an integrated automatic parallelization framework that simultaneously optimizes for hardware prefetch stream buffer utilization, locality, and parallelism. Characterizing the legal space of fusion structures in the polyhedral compiler framework is not difficult. However, incorporating useful optimization criteria into such a legal space to pick good fusion structures is very hard. The model we propose captures utilization of hardware prefetch streams, loss of parallelism, as well as constraints imposed by privatization and code expansion into a single convex optimization space. The model scales very well to program sections spanning hundreds of lines of code. It has been implemented into the polyhedral pass of the IBM XL optimizing compiler. Experimental results demonstrate its effectiveness in finding good fusion structures for codes including SPEC benchmarks and large applications. An improvement ranging from 5% to nearly a factor of 2.75× is obtained over the current production compiler optimizer on these benchmarks.

查看原文本刊更多论文

自动并行编译器中的融合和代码运动模型

环融合已被广泛研究，但在某种程度上与其他转换分离。这主要是由于缺乏用于高级转换组合的应用程序的强大的中间表示。核聚变表现出与并行性和局部性的强相互作用。目前，还没有一种模型来确定一个自动并行化编译器中所有组件的融合结构。这也是为什么从未探索过跨越数百行代码的长序列循环巢的优化和自动并行化的所有好处的原因之一。我们在一个集成的自动并行化框架中提出了一个融合模型，该模型同时优化了硬件预取流缓冲利用率、局部性和并行性。在多面体编译器框架中描述融合结构的合法空间并不困难。然而，将有用的优化标准整合到这样一个合法的空间中以选择好的融合结构是非常困难的。我们提出的模型捕获了硬件预取流的利用率，并行性的损失，以及私有化和代码扩展到单个凸优化空间所施加的约束。该模型可以很好地扩展到跨越数百行代码的程序部分。它已经被实现到IBM XL优化编译器的多面体通道中。实验结果表明，该方法可以有效地在包括SPEC基准测试和大型应用在内的代码中找到良好的融合结构。在这些基准测试中，与当前的生产编译器优化器相比，改进幅度从5%到近2.75倍不等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量