Use of Previously Acquired Positioning of Optimizations for Phase Ordering Exploration

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2015-06-01 DOI:10.1145/2764967.2764978

Ricardo Nobre, L. G. A. Martins, João MP Cardoso

{"title":"Use of Previously Acquired Positioning of Optimizations for Phase Ordering Exploration","authors":"Ricardo Nobre, L. G. A. Martins, João MP Cardoso","doi":"10.1145/2764967.2764978","DOIUrl":null,"url":null,"abstract":"This paper presents a new approach to efficiently search for suitable compiler pass sequences, a challenge known as phase ordering. Our approach relies on information about the relative positions of compiler passes in compiler pass sequences previously generated for a set of functions when compiling for a specific processor. We enhanced two iterative compiler pass exploration schemes, one relying on simple sequential compiler pass insertion and other implementing an auto-tuned simulated annealing process, with a data structure that holds information about the relative positions of compiler sequences; in order to reduce the set of compiler passes considered for insertion in a given position of a given candidate compiler pass sequence to include only the passes that have a higher probability of performing well on that relative position in the compiler sequence, speeding up the exploration time as a result. We tested our approach with two different compilers and two different targets; the ReflectC and the LLVM compilers, targeting a MicroBlaze processor and a LEON3 processor, respectively. The experimental results show that we can considerably reduce the number of algorithm iterations by a factor of up to more than an order of magnitude when targeting the MicroBlaze or the LEON3, while finding compiler sequences that result in binaries that when executed on the target processor/simulator are able to outperform (i.e. use less CPU cycles) all the standard optimization levels (i.e., we compare against the most performing optimization level flag on each kernel, e.g. -O1, -O2 or -O3 in the case of LLVM) by a geometric mean performance improvement of 1.23x and 1.20x when targeting the MicroBlaze processor, and 1.94x and 2.65x when targetting the LEON3 processor; for each of the two exploration algorithms and two kernel sets considered.","PeriodicalId":110157,"journal":{"name":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","volume":"37 19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2764967.2764978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

This paper presents a new approach to efficiently search for suitable compiler pass sequences, a challenge known as phase ordering. Our approach relies on information about the relative positions of compiler passes in compiler pass sequences previously generated for a set of functions when compiling for a specific processor. We enhanced two iterative compiler pass exploration schemes, one relying on simple sequential compiler pass insertion and other implementing an auto-tuned simulated annealing process, with a data structure that holds information about the relative positions of compiler sequences; in order to reduce the set of compiler passes considered for insertion in a given position of a given candidate compiler pass sequence to include only the passes that have a higher probability of performing well on that relative position in the compiler sequence, speeding up the exploration time as a result. We tested our approach with two different compilers and two different targets; the ReflectC and the LLVM compilers, targeting a MicroBlaze processor and a LEON3 processor, respectively. The experimental results show that we can considerably reduce the number of algorithm iterations by a factor of up to more than an order of magnitude when targeting the MicroBlaze or the LEON3, while finding compiler sequences that result in binaries that when executed on the target processor/simulator are able to outperform (i.e. use less CPU cycles) all the standard optimization levels (i.e., we compare against the most performing optimization level flag on each kernel, e.g. -O1, -O2 or -O3 in the case of LLVM) by a geometric mean performance improvement of 1.23x and 1.20x when targeting the MicroBlaze processor, and 1.94x and 2.65x when targetting the LEON3 processor; for each of the two exploration algorithms and two kernel sets considered.

查看原文本刊更多论文

利用先前获得的优化定位进行相位排序探索

本文提出了一种新的方法来有效地搜索合适的编译器传递序列，这是一个被称为相位排序的挑战。我们的方法依赖于编译特定处理器时，编译器传递序列中编译器传递序列的相对位置信息。我们增强了两种迭代编译器通道探索方案，一种依赖于简单的顺序编译器通道插入，另一种实现了自动调整的模拟退火过程，其数据结构包含有关编译器序列相对位置的信息;为了减少考虑在给定候选编译器传递序列的给定位置插入的编译器传递集，只包括在编译器序列中相对位置上执行良好的概率更高的传递，从而加快了探索时间。我们用两个不同的编译器和两个不同的目标测试了我们的方法;ReflectC和LLVM编译器，分别针对MicroBlaze处理器和LEON3处理器。实验结果表明，当以MicroBlaze或LEON3为目标时，我们可以大大减少算法迭代的次数，最多可减少一个数量级，同时找到编译器序列，这些编译器序列在目标处理器/模拟器上执行时能够优于(即使用更少的CPU周期)所有标准优化级别(即，我们比较每个内核上性能最好的优化级别标志，例如- 01)。-O2或-O3在LLVM的情况下)，当针对MicroBlaze处理器时，几何平均性能提高1.23倍和1.20倍，当针对LEON3处理器时，几何平均性能提高1.94倍和2.65倍;对于每两个探索算法和两个核集考虑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems

自引率

0.00%

发文量