M. Berezov, Corinne Ancourt, Justyna Zawalska, Maryna Savchenko
{"title":"COLA-Gen:自动生成基准代码的主动学习技术","authors":"M. Berezov, Corinne Ancourt, Justyna Zawalska, Maryna Savchenko","doi":"10.4230/OASIcs.PARMA-DITAM.2022.3","DOIUrl":null,"url":null,"abstract":"Benchmarking is crucial in code optimization. It is required to have a set of programs that we consider representative to validate optimization techniques or evaluate predictive performance models. However, there is a shortage of available benchmarks for code optimization, more pronounced when using machine learning techniques. The problem lies in the number of programs for testing because these techniques are sensitive to the quality and quantity of data used for training. Our work aims to address these limitations. We present a methodology to efficiently generate benchmarks for the code optimization domain. It includes an automatic code generator, an associated DSL handling, the high-level specification of the desired code, and a smart strategy for extending the benchmark as needed. The strategy is based on Active Learning techniques and helps to generate the most representative data for our benchmark. We observed that Machine Learning models trained on our benchmark produce better quality predictions and converge faster. The optimization based on the Active Learning method achieved up to 15% more speed-up than the passive learning method using the same amount of data. The experiments were run on Intel® Core™ i7-8650U 4C/4T @1.90GHz with capacity caches of L1: 32KB, L2: 256KB, L3: 8192KB and 32GB DDR4 DIMM RAM, Phys. cores: 4, Compiler: GCC 5.4.0, Number of Threads: 4, Opt. level: -O3","PeriodicalId":436349,"journal":{"name":"PARMA-DITAM@HiPEAC","volume":"11 suppl_1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"COLA-Gen: Active Learning Techniques for Automatic Code Generation of Benchmarks\",\"authors\":\"M. Berezov, Corinne Ancourt, Justyna Zawalska, Maryna Savchenko\",\"doi\":\"10.4230/OASIcs.PARMA-DITAM.2022.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Benchmarking is crucial in code optimization. It is required to have a set of programs that we consider representative to validate optimization techniques or evaluate predictive performance models. However, there is a shortage of available benchmarks for code optimization, more pronounced when using machine learning techniques. The problem lies in the number of programs for testing because these techniques are sensitive to the quality and quantity of data used for training. Our work aims to address these limitations. We present a methodology to efficiently generate benchmarks for the code optimization domain. It includes an automatic code generator, an associated DSL handling, the high-level specification of the desired code, and a smart strategy for extending the benchmark as needed. The strategy is based on Active Learning techniques and helps to generate the most representative data for our benchmark. We observed that Machine Learning models trained on our benchmark produce better quality predictions and converge faster. The optimization based on the Active Learning method achieved up to 15% more speed-up than the passive learning method using the same amount of data. The experiments were run on Intel® Core™ i7-8650U 4C/4T @1.90GHz with capacity caches of L1: 32KB, L2: 256KB, L3: 8192KB and 32GB DDR4 DIMM RAM, Phys. cores: 4, Compiler: GCC 5.4.0, Number of Threads: 4, Opt. level: -O3\",\"PeriodicalId\":436349,\"journal\":{\"name\":\"PARMA-DITAM@HiPEAC\",\"volume\":\"11 suppl_1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PARMA-DITAM@HiPEAC\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PARMA-DITAM@HiPEAC","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/OASIcs.PARMA-DITAM.2022.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
COLA-Gen: Active Learning Techniques for Automatic Code Generation of Benchmarks
Benchmarking is crucial in code optimization. It is required to have a set of programs that we consider representative to validate optimization techniques or evaluate predictive performance models. However, there is a shortage of available benchmarks for code optimization, more pronounced when using machine learning techniques. The problem lies in the number of programs for testing because these techniques are sensitive to the quality and quantity of data used for training. Our work aims to address these limitations. We present a methodology to efficiently generate benchmarks for the code optimization domain. It includes an automatic code generator, an associated DSL handling, the high-level specification of the desired code, and a smart strategy for extending the benchmark as needed. The strategy is based on Active Learning techniques and helps to generate the most representative data for our benchmark. We observed that Machine Learning models trained on our benchmark produce better quality predictions and converge faster. The optimization based on the Active Learning method achieved up to 15% more speed-up than the passive learning method using the same amount of data. The experiments were run on Intel® Core™ i7-8650U 4C/4T @1.90GHz with capacity caches of L1: 32KB, L2: 256KB, L3: 8192KB and 32GB DDR4 DIMM RAM, Phys. cores: 4, Compiler: GCC 5.4.0, Number of Threads: 4, Opt. level: -O3