ANGHABENCH:一个具有一百万可编译C基准的套件，用于减少代码大小

2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2021-02-27 DOI:10.1109/CGO51591.2021.9370322

A. F. Silva, Jerônimo Nunes Rocha, B. Guimarães, Fernando Magno Quintão Pereira

{"title":"ANGHABENCH:一个具有一百万可编译C基准的套件，用于减少代码大小","authors":"A. F. Silva, Jerônimo Nunes Rocha, B. Guimarães, Fernando Magno Quintão Pereira","doi":"10.1109/CGO51591.2021.9370322","DOIUrl":null,"url":null,"abstract":"A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.","PeriodicalId":275062,"journal":{"name":"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction\",\"authors\":\"A. F. Silva, Jerônimo Nunes Rocha, B. Guimarães, Fernando Magno Quintão Pereira\",\"doi\":\"10.1109/CGO51591.2021.9370322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.\",\"PeriodicalId\":275062,\"journal\":{\"name\":\"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CGO51591.2021.9370322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO51591.2021.9370322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

摘要

预测性编译器使用程序的属性来决定如何优化它。编译器在一组程序上进行训练，以得出一个模型，该模型在面对未知代码时决定它的动作。预测编译的挑战之一是如何找到好的训练集。不管编程语言是什么，人造基准的可用性是有限的。此外，当前的合成器生成的代码与实际的程序非常不同，而且由于程序的依赖性，从开放的存储库中挖掘可编译的代码非常困难。在本文中，我们使用web爬行和类型推断的组合来克服C编程语言的这些问题。我们使用基于Hindley-Milner算法的类型重构器来生成ANGHABENCH，这是一个几乎无限的真实世界可编译C程序集合。虽然ANGHABENCH程序是不可执行的，但是它们可以被任何兼容C的编译器转换成目标文件。因此，它们可用于训练编译器以减少代码大小。我们使用了数千个ANGHABENCH程序来训练基于LLVM的预测编译器YACOS。使用ANGHABENCH自动调优的YACOS版本为LLVM测试套件生成的二进制文件比clang -Oz小10%以上。它压缩代码，甚至不受2019年发布的最先进的功能序列对齐技术的影响，因为它不需要大型二进制文件即可正常工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction

A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

自引率

0.00%

发文量