Revisiting loop fusion in the polyhedral framework

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming Pub Date : 2014-02-06 DOI:10.1145/2555243.2555250

Sanyam Mehta, P. Lin, P. Yew

{"title":"Revisiting loop fusion in the polyhedral framework","authors":"Sanyam Mehta, P. Lin, P. Yew","doi":"10.1145/2555243.2555250","DOIUrl":null,"url":null,"abstract":"Loop fusion is an important compiler optimization for improving memory hierarchy performance through enabling data reuse. Traditional compilers have approached loop fusion in a manner decoupled from other high-level loop optimizations, missing several interesting solutions. Recently, the polyhedral compiler framework with its ability to compose complex transformations, has proved to be promising in performing loop optimizations for small programs. However, our experiments with large programs using state-of-the-art polyhedral compiler frameworks reveal suboptimal fusion partitions in the transformed code. We trace the reason for this to be lack of an effective cost model to choose a good fusion partitioning among the possible choices, which increase exponentially with the number of program statements. In this paper, we propose a fusion algorithm to choose good fusion partitions with two objective functions - achieving good data reuse and preserving parallelism inherent in the source code. These objectives, although targeted by previous work in traditional compilers, pose new challenges within the polyhedral compiler framework and have thus not been addressed. In our algorithm, we propose several heuristics that work effectively within the polyhedral compiler framework and allow us to achieve the proposed objectives. Experimental results show that our fusion algorithm achieves performance comparable to the existing polyhedral compilers for small kernel programs, and significantly outperforms them for large benchmark programs such as those in the SPEC benchmark suite.","PeriodicalId":286119,"journal":{"name":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2555243.2555250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

Abstract

Loop fusion is an important compiler optimization for improving memory hierarchy performance through enabling data reuse. Traditional compilers have approached loop fusion in a manner decoupled from other high-level loop optimizations, missing several interesting solutions. Recently, the polyhedral compiler framework with its ability to compose complex transformations, has proved to be promising in performing loop optimizations for small programs. However, our experiments with large programs using state-of-the-art polyhedral compiler frameworks reveal suboptimal fusion partitions in the transformed code. We trace the reason for this to be lack of an effective cost model to choose a good fusion partitioning among the possible choices, which increase exponentially with the number of program statements. In this paper, we propose a fusion algorithm to choose good fusion partitions with two objective functions - achieving good data reuse and preserving parallelism inherent in the source code. These objectives, although targeted by previous work in traditional compilers, pose new challenges within the polyhedral compiler framework and have thus not been addressed. In our algorithm, we propose several heuristics that work effectively within the polyhedral compiler framework and allow us to achieve the proposed objectives. Experimental results show that our fusion algorithm achieves performance comparable to the existing polyhedral compilers for small kernel programs, and significantly outperforms them for large benchmark programs such as those in the SPEC benchmark suite.

查看原文本刊更多论文

重述多面体框架中的环融合

循环融合是一项重要的编译器优化，通过启用数据重用来提高内存层次结构性能。传统的编译器以一种与其他高级循环优化解耦的方式处理循环融合，错过了几个有趣的解决方案。最近，具有组合复杂转换能力的多面体编译器框架已被证明在执行小程序的循环优化方面很有前途。然而，我们使用最先进的多面体编译器框架对大型程序进行的实验揭示了转换代码中的次优融合分区。我们认为其原因是缺乏一个有效的成本模型来在可能的选择中选择一个好的融合划分，而这些选择随着程序语句的数量呈指数级增长。本文提出了一种基于两个目标函数选择好的融合分区的融合算法——实现良好的数据重用和保持源代码固有的并行性。这些目标虽然是传统编译器以前工作的目标，但在多面体编译器框架内提出了新的挑战，因此尚未得到解决。在我们的算法中，我们提出了几种在多面体编译器框架内有效工作的启发式方法，并允许我们实现所提出的目标。实验结果表明，我们的融合算法在小型内核程序上的性能与现有的多面体编译器相当，在大型基准程序(如SPEC基准套件)上的性能明显优于现有的多面体编译器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming

自引率

0.00%

发文量