ANGHABENCH:一个具有一百万可编译C基准的套件,用于减少代码大小

A. F. Silva, Jerônimo Nunes Rocha, B. Guimarães, Fernando Magno Quintão Pereira
{"title":"ANGHABENCH:一个具有一百万可编译C基准的套件,用于减少代码大小","authors":"A. F. Silva, Jerônimo Nunes Rocha, B. Guimarães, Fernando Magno Quintão Pereira","doi":"10.1109/CGO51591.2021.9370322","DOIUrl":null,"url":null,"abstract":"A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.","PeriodicalId":275062,"journal":{"name":"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction\",\"authors\":\"A. F. Silva, Jerônimo Nunes Rocha, B. Guimarães, Fernando Magno Quintão Pereira\",\"doi\":\"10.1109/CGO51591.2021.9370322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.\",\"PeriodicalId\":275062,\"journal\":{\"name\":\"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CGO51591.2021.9370322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO51591.2021.9370322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

摘要

预测性编译器使用程序的属性来决定如何优化它。编译器在一组程序上进行训练,以得出一个模型,该模型在面对未知代码时决定它的动作。预测编译的挑战之一是如何找到好的训练集。不管编程语言是什么,人造基准的可用性是有限的。此外,当前的合成器生成的代码与实际的程序非常不同,而且由于程序的依赖性,从开放的存储库中挖掘可编译的代码非常困难。在本文中,我们使用web爬行和类型推断的组合来克服C编程语言的这些问题。我们使用基于Hindley-Milner算法的类型重构器来生成ANGHABENCH,这是一个几乎无限的真实世界可编译C程序集合。虽然ANGHABENCH程序是不可执行的,但是它们可以被任何兼容C的编译器转换成目标文件。因此,它们可用于训练编译器以减少代码大小。我们使用了数千个ANGHABENCH程序来训练基于LLVM的预测编译器YACOS。使用ANGHABENCH自动调优的YACOS版本为LLVM测试套件生成的二进制文件比clang -Oz小10%以上。它压缩代码,甚至不受2019年发布的最先进的功能序列对齐技术的影响,因为它不需要大型二进制文件即可正常工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction
A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信