Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption

Kyriakos Georgiou, Craig Blackmore, S. X. D. Souza, K. Eder
{"title":"Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption","authors":"Kyriakos Georgiou, Craig Blackmore, S. X. D. Souza, K. Eder","doi":"10.1145/3207719.3207727","DOIUrl":null,"url":null,"abstract":"This paper presents the interesting observation that by performing fewer of the optimizations available in a standard compiler optimization level such as -02, while preserving their original ordering, significant savings can be achieved in both execution time and energy consumption. This observation has been validated on two embedded processors, namely the ARM Cortex-M0 and the ARM Cortex-M3, using two different versions of the LLVM compilation framework; v3.8 and v5.0. Experimental evaluation with 71 embedded benchmarks demonstrated performance gains for at least half of the benchmarks for both processors. An average execution time reduction of 2.4% and 5.3% was achieved across all the benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with execution time improvements ranging from 1% up to 90% over the -02. The savings that can be achieved are in the same range as what can be achieved by the state-of-the-art compilation approaches that use iterative compilation or machine learning to select flags or to determine phase orderings that result in more efficient code. In contrast to these time consuming and expensive to apply techniques, our approach only needs to test a limited number of optimization configurations, less than 64, to obtain similar or even better savings. Furthermore, our approach can support multi-criteria optimization as it targets execution time, energy consumption and code size at the same time.","PeriodicalId":284835,"journal":{"name":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3207719.3207727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

This paper presents the interesting observation that by performing fewer of the optimizations available in a standard compiler optimization level such as -02, while preserving their original ordering, significant savings can be achieved in both execution time and energy consumption. This observation has been validated on two embedded processors, namely the ARM Cortex-M0 and the ARM Cortex-M3, using two different versions of the LLVM compilation framework; v3.8 and v5.0. Experimental evaluation with 71 embedded benchmarks demonstrated performance gains for at least half of the benchmarks for both processors. An average execution time reduction of 2.4% and 5.3% was achieved across all the benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with execution time improvements ranging from 1% up to 90% over the -02. The savings that can be achieved are in the same range as what can be achieved by the state-of-the-art compilation approaches that use iterative compilation or machine learning to select flags or to determine phase orderings that result in more efficient code. In contrast to these time consuming and expensive to apply techniques, our approach only needs to test a limited number of optimization configurations, less than 64, to obtain similar or even better savings. Furthermore, our approach can support multi-criteria optimization as it targets execution time, energy consumption and code size at the same time.
少即是多:利用标准编译器优化水平,以获得更好的性能和能耗
本文提出了一个有趣的观察结果,即通过执行较少的标准编译器优化级别(如-02)中可用的优化,同时保留其原始顺序,可以在执行时间和能耗方面实现显著节省。这一观察结果已经在两个嵌入式处理器上得到验证,即ARM Cortex-M0和ARM Cortex-M3,使用两个不同版本的LLVM编译框架;V3.8和v5.0。使用71个嵌入式基准测试进行的实验评估表明,两个处理器至少有一半的基准测试获得了性能提升。在所有基准测试中,Cortex-M0和Cortex-M3处理器的平均执行时间分别减少了2.4%和5.3%,与-02相比,执行时间的改善幅度从1%到90%不等。可以实现的节省与使用迭代编译或机器学习来选择标志或确定产生更高效代码的阶段顺序的最先进的编译方法可以实现的节省是相同的。与这些耗时且昂贵的应用技术相比,我们的方法只需要测试有限数量的优化配置,少于64个,就可以获得类似甚至更好的节省。此外,我们的方法可以支持多标准优化,因为它同时针对执行时间、能耗和代码大小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信