基于机器学习优化的实用语义程序属性聚合

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2010-10-24 DOI:10.1145/1878921.1878951

Mircea Namolaru, Albert Cohen, G. Fursin, A. Zaks, Ari Freund

{"title":"基于机器学习优化的实用语义程序属性聚合","authors":"Mircea Namolaru, Albert Cohen, G. Fursin, A. Zaks, Ari Freund","doi":"10.1145/1878921.1878951","DOIUrl":null,"url":null,"abstract":"Iterative search combined with machine learning is a promising approach to design optimizing compilers harnessing the complexity of modern computing systems. While traversing a program optimization space, we collect characteristic feature vectors of the program, and use them to discover correlations across programs, target architectures, data sets, and performance. Predictive models can be derived from such correlations, effectively hiding the time-consuming feedback-directed optimization process from the application programmer.\n One key task of this approach, naturally assigned to compiler experts, is to design relevant features and implement scalable feature extractors, including statistical models that filter the most relevant information from millions of lines of code. This new task turns out to be a very challenging and tedious one from a compiler construction perspective. So far, only a limited set of ad-hoc, largely syntactical features have been devised. Yet machine learning is only able to discover correlations from information it is fed with: it is critical to select topical program features for a given optimization problem in order for this approach to succeed.\n We propose a general method for systematically generating numerical features from a program. This method puts no restrictions on how to logically and algebraically aggregate semantical properties into numerical features. We illustrate our method on the difficult problem of selecting the best possible combination of 88 available optimizations in GCC. We achieve 74% of the potential speedup obtained through iterative compilation on a wide range of benchmarks and four different general-purpose and embedded architectures. Our work is particularly relevant to embedded system designers willing to quickly adapt the optimization heuristics of a mainstream compiler to their custom ISA, microarchitecture, benchmark suite and workload. Our method has been integrated with the publicly released MILEPOST GCC [14].","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":"{\"title\":\"Practical aggregation of semantical program properties for machine learning based optimization\",\"authors\":\"Mircea Namolaru, Albert Cohen, G. Fursin, A. Zaks, Ari Freund\",\"doi\":\"10.1145/1878921.1878951\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Iterative search combined with machine learning is a promising approach to design optimizing compilers harnessing the complexity of modern computing systems. While traversing a program optimization space, we collect characteristic feature vectors of the program, and use them to discover correlations across programs, target architectures, data sets, and performance. Predictive models can be derived from such correlations, effectively hiding the time-consuming feedback-directed optimization process from the application programmer.\\n One key task of this approach, naturally assigned to compiler experts, is to design relevant features and implement scalable feature extractors, including statistical models that filter the most relevant information from millions of lines of code. This new task turns out to be a very challenging and tedious one from a compiler construction perspective. So far, only a limited set of ad-hoc, largely syntactical features have been devised. Yet machine learning is only able to discover correlations from information it is fed with: it is critical to select topical program features for a given optimization problem in order for this approach to succeed.\\n We propose a general method for systematically generating numerical features from a program. This method puts no restrictions on how to logically and algebraically aggregate semantical properties into numerical features. We illustrate our method on the difficult problem of selecting the best possible combination of 88 available optimizations in GCC. We achieve 74% of the potential speedup obtained through iterative compilation on a wide range of benchmarks and four different general-purpose and embedded architectures. Our work is particularly relevant to embedded system designers willing to quickly adapt the optimization heuristics of a mainstream compiler to their custom ISA, microarchitecture, benchmark suite and workload. Our method has been integrated with the publicly released MILEPOST GCC [14].\",\"PeriodicalId\":136293,\"journal\":{\"name\":\"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"45\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1878921.1878951\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1878921.1878951","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

摘要

迭代搜索与机器学习相结合是一种很有前途的方法，可以利用现代计算系统的复杂性来设计优化编译器。在遍历程序优化空间时，我们收集程序的特征向量，并使用它们来发现程序、目标架构、数据集和性能之间的相关性。预测模型可以从这种相关性中得到，从而有效地向应用程序程序员隐藏耗时的反馈导向优化过程。这种方法的一个关键任务是设计相关的特性并实现可扩展的特征提取器，包括从数百万行代码中过滤最相关信息的统计模型，这自然是分配给编译器专家的。从编译器构造的角度来看，这个新任务是非常具有挑战性和乏味的。到目前为止，只设计了一组有限的特别的、主要是语法的特性。然而，机器学习只能从它所提供的信息中发现相关性:为了使这种方法成功，为给定的优化问题选择主题程序特征是至关重要的。我们提出了一种从程序系统地生成数值特征的一般方法。这种方法对如何在逻辑上和代数上将语义属性聚合为数值特征没有限制。我们举例说明了如何在GCC中选择88个可用优化的最佳组合这一难题。通过在广泛的基准测试和四种不同的通用和嵌入式架构上进行迭代编译，我们实现了74%的潜在加速。我们的工作与嵌入式系统设计人员特别相关，他们愿意快速地将主流编译器的优化启发式适应他们的自定义ISA、微架构、基准套件和工作负载。我们的方法已经与公开发布的MILEPOST GCC集成在一起[14]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Practical aggregation of semantical program properties for machine learning based optimization

Iterative search combined with machine learning is a promising approach to design optimizing compilers harnessing the complexity of modern computing systems. While traversing a program optimization space, we collect characteristic feature vectors of the program, and use them to discover correlations across programs, target architectures, data sets, and performance. Predictive models can be derived from such correlations, effectively hiding the time-consuming feedback-directed optimization process from the application programmer. One key task of this approach, naturally assigned to compiler experts, is to design relevant features and implement scalable feature extractors, including statistical models that filter the most relevant information from millions of lines of code. This new task turns out to be a very challenging and tedious one from a compiler construction perspective. So far, only a limited set of ad-hoc, largely syntactical features have been devised. Yet machine learning is only able to discover correlations from information it is fed with: it is critical to select topical program features for a given optimization problem in order for this approach to succeed. We propose a general method for systematically generating numerical features from a program. This method puts no restrictions on how to logically and algebraically aggregate semantical properties into numerical features. We illustrate our method on the difficult problem of selecting the best possible combination of 88 available optimizations in GCC. We achieve 74% of the potential speedup obtained through iterative compilation on a wide range of benchmarks and four different general-purpose and embedded architectures. Our work is particularly relevant to embedded system designers willing to quickly adapt the optimization heuristics of a mainstream compiler to their custom ISA, microarchitecture, benchmark suite and workload. Our method has been integrated with the publicly released MILEPOST GCC [14].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems

自引率

0.00%

发文量