Cecília Conde Kind, Michael Canesche, Fernando Magno Quintão Pereira
{"title":"Jotai:生成可执行C基准的方法","authors":"Cecília Conde Kind, Michael Canesche, Fernando Magno Quintão Pereira","doi":"10.1016/j.cola.2025.101368","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a methodology for automatically generating well-defined executable benchmarks in C. The generation process is fully automatic: C files are extracted from open-source repositories and split into compilation units. A type reconstructor infers all the types and declarations required to ensure that functions compile. The generation of inputs is guided by constraints specified via a domain-specific language. This DSL refines the types of functions, for instance, creating relations between integer arguments and the length of buffers. Off-the-shelf tools such as <span>AddressSanitizer</span> and <span>Kcc</span> filter out programs with undefined behavior. To demonstrate applicability, this paper analyzes the dynamic behavior of different collections of benchmarks, some with up to 30 thousand samples, to support several observations: (i) the speedup of optimizations does not follow a normal distribution—a property assumed by statistical tests such as the T-test and the Z-test; (ii) there is strong correlation between number of instructions fetched and running time in x86 and in ARM processors; hence, the former—a non-varying quantity—can be used as a proxy for the latter—a varying quantity—in the autotuning of compilation tasks. The apparatus to generate benchmarks is publicly available. A collection of 18 thousand programs thus produced is also available as a <span>CompilerGym</span>’s dataset.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"85 ","pages":"Article 101368"},"PeriodicalIF":1.8000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Jotai: A methodology for the generation of executable C benchmarks\",\"authors\":\"Cecília Conde Kind, Michael Canesche, Fernando Magno Quintão Pereira\",\"doi\":\"10.1016/j.cola.2025.101368\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a methodology for automatically generating well-defined executable benchmarks in C. The generation process is fully automatic: C files are extracted from open-source repositories and split into compilation units. A type reconstructor infers all the types and declarations required to ensure that functions compile. The generation of inputs is guided by constraints specified via a domain-specific language. This DSL refines the types of functions, for instance, creating relations between integer arguments and the length of buffers. Off-the-shelf tools such as <span>AddressSanitizer</span> and <span>Kcc</span> filter out programs with undefined behavior. To demonstrate applicability, this paper analyzes the dynamic behavior of different collections of benchmarks, some with up to 30 thousand samples, to support several observations: (i) the speedup of optimizations does not follow a normal distribution—a property assumed by statistical tests such as the T-test and the Z-test; (ii) there is strong correlation between number of instructions fetched and running time in x86 and in ARM processors; hence, the former—a non-varying quantity—can be used as a proxy for the latter—a varying quantity—in the autotuning of compilation tasks. The apparatus to generate benchmarks is publicly available. A collection of 18 thousand programs thus produced is also available as a <span>CompilerGym</span>’s dataset.</div></div>\",\"PeriodicalId\":48552,\"journal\":{\"name\":\"Journal of Computer Languages\",\"volume\":\"85 \",\"pages\":\"Article 101368\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Languages\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590118425000541\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118425000541","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Jotai: A methodology for the generation of executable C benchmarks
This paper presents a methodology for automatically generating well-defined executable benchmarks in C. The generation process is fully automatic: C files are extracted from open-source repositories and split into compilation units. A type reconstructor infers all the types and declarations required to ensure that functions compile. The generation of inputs is guided by constraints specified via a domain-specific language. This DSL refines the types of functions, for instance, creating relations between integer arguments and the length of buffers. Off-the-shelf tools such as AddressSanitizer and Kcc filter out programs with undefined behavior. To demonstrate applicability, this paper analyzes the dynamic behavior of different collections of benchmarks, some with up to 30 thousand samples, to support several observations: (i) the speedup of optimizations does not follow a normal distribution—a property assumed by statistical tests such as the T-test and the Z-test; (ii) there is strong correlation between number of instructions fetched and running time in x86 and in ARM processors; hence, the former—a non-varying quantity—can be used as a proxy for the latter—a varying quantity—in the autotuning of compilation tasks. The apparatus to generate benchmarks is publicly available. A collection of 18 thousand programs thus produced is also available as a CompilerGym’s dataset.