Boosting Compiler Testing via Compiler Optimization Exploration

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2022-03-05 DOI:10.1145/3508362

Junjie Chen, Chenyao Suo

{"title":"Boosting Compiler Testing via Compiler Optimization Exploration","authors":"Junjie Chen, Chenyao Suo","doi":"10.1145/3508362","DOIUrl":null,"url":null,"abstract":"Compilers are a kind of important software, and similar to the quality assurance of other software, compiler testing is one of the most widely-used ways of guaranteeing their quality. Compiler bugs tend to occur in compiler optimizations. Detecting optimization bugs needs to consider two main factors: (1) the optimization flags controlling the accessability of the compiler buggy code should be turned on; and (2) the test program should be able to trigger the buggy code. However, existing compiler testing approaches only consider the latter to generate effective test programs, but just run them under several pre-defined optimization levels (e.g., -O0, -O1, -O2, -O3, -Os in GCC). To better understand the influence of compiler optimizations on compiler testing, we conduct the first empirical study, and find that (1) all the bugs detected under the widely-used optimization levels are also detected under the explored optimization settings (we call a combination of optimization flags turned on for compilation an optimization setting), while 83.54% of bugs are only detected under the latter; (2) there exist both inhibition effect and promotion effect among optimization flags for compiler testing, indicating the necessity and challenges of considering the factor of compiler optimizations in compiler testing. We then propose the first approach, called COTest, by considering both factors to test compilers. Specifically, COTest first adopts machine-learning (the XGBoost algorithm) to model the relationship between test programs and optimization settings, to predict the bug-triggering probability of a test program under an optimization setting. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. The experiments on GCC and LLVM demonstrate its effectiveness, especially COTest detects 17 previously unknown bugs, 11 of which have been fixed or confirmed by developers.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"37 1","pages":"1 - 33"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508362","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Compilers are a kind of important software, and similar to the quality assurance of other software, compiler testing is one of the most widely-used ways of guaranteeing their quality. Compiler bugs tend to occur in compiler optimizations. Detecting optimization bugs needs to consider two main factors: (1) the optimization flags controlling the accessability of the compiler buggy code should be turned on; and (2) the test program should be able to trigger the buggy code. However, existing compiler testing approaches only consider the latter to generate effective test programs, but just run them under several pre-defined optimization levels (e.g., -O0, -O1, -O2, -O3, -Os in GCC). To better understand the influence of compiler optimizations on compiler testing, we conduct the first empirical study, and find that (1) all the bugs detected under the widely-used optimization levels are also detected under the explored optimization settings (we call a combination of optimization flags turned on for compilation an optimization setting), while 83.54% of bugs are only detected under the latter; (2) there exist both inhibition effect and promotion effect among optimization flags for compiler testing, indicating the necessity and challenges of considering the factor of compiler optimizations in compiler testing. We then propose the first approach, called COTest, by considering both factors to test compilers. Specifically, COTest first adopts machine-learning (the XGBoost algorithm) to model the relationship between test programs and optimization settings, to predict the bug-triggering probability of a test program under an optimization setting. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. The experiments on GCC and LLVM demonstrate its effectiveness, especially COTest detects 17 previously unknown bugs, 11 of which have been fixed or confirmed by developers.

查看原文本刊更多论文

通过编译器优化探索促进编译器测试

编译器是一种重要的软件，与其他软件的质量保证一样，编译器测试是保证其质量的最广泛使用的方法之一。编译器错误往往发生在编译器优化中。检测优化bug需要考虑两个主要因素:(1)应该打开控制编译器bug代码可访问性的优化标志;(2)测试程序应该能够触发有bug的代码。然而，现有的编译器测试方法只考虑后者来生成有效的测试程序，而只是在几个预定义的优化级别下运行它们(例如GCC中的- 0、- 01、-O2、-O3、- o)。为了更好地理解编译器优化对编译器测试的影响，我们进行了第一次实证研究，发现(1)在广泛使用的优化级别下检测到的所有bug，在探索的优化设置(我们将为编译打开的优化标志组合称为优化设置)下也能检测到，而83.54%的bug仅在后者下被检测到;(2)编译器测试的优化标志之间既有抑制作用，也有促进作用，说明在编译器测试中考虑编译器优化因素的必要性和挑战性。然后我们提出第一种方法，称为COTest，通过考虑这两个因素来测试编译器。具体来说，COTest首先采用机器学习(XGBoost算法)对测试程序与优化设置之间的关系进行建模，预测在优化设置下测试程序触发bug的概率。然后，设计了一种多样性增强策略，选择一组不同的候选优化设置用于测试程序的预测。最后，根据预测的bug触发概率选择Top-K优化设置进行编译器测试。然后，设计了一种多样性增强策略，选择一组不同的候选优化设置用于测试程序的预测。最后，根据预测的bug触发概率选择Top-K优化设置进行编译器测试。在GCC和LLVM上的实验证明了它的有效性，特别是COTest检测到17个以前未知的bug，其中11个已经被开发人员修复或确认。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量