Better test cases for better automated program repair

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering Pub Date : 2017-08-21 DOI:10.1145/3106237.3106274

Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, Lin Tan

{"title":"Better test cases for better automated program repair","authors":"Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, Lin Tan","doi":"10.1145/3106237.3106274","DOIUrl":null,"url":null,"abstract":"Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad).","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"117","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106237.3106274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 117

Abstract

Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad).

查看原文本刊更多论文

为更好的自动化程序修复提供更好的测试用例

由于测试用例的能力不足，自动生成并验证程序修复技术(G&V技术)会产生许多过拟合的补丁。这种过拟合的补丁是不正确的补丁，它只能使所有给定的测试用例通过，但不能修复错误。在这项工作中，我们提出了一个名为Opad (overfitting patch Detection)的过拟合补丁检测框架。Opad通过增强现有的测试用例来过滤掉过度拟合的补丁，从而帮助改进G&V技术。为了增强测试用例，Opad使用模糊测试来生成新的测试用例，并使用两个测试工具(崩溃和内存安全)来增强自动生成补丁的有效性检查。Opad还使用一种新的度量(称为O-measure)来确定自动生成的补丁是否过拟合。对来自7个大型系统(GenProg和SPR使用相同的基准)的45个错误进行评估，Opad过滤掉了由GenProg/AE, Kali和SPR生成的75.2%(321/427)过拟合补丁。另外，Opad引导SPR为另外一个bug生成正确的补丁(原来的SPR为11个bug生成正确的补丁)。我们的分析还表明，如果使用更好的测试预言器(除了Opad使用的崩溃和内存安全预言器)，多达40%的自动生成的测试用例可能会进一步改进G&V技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

自引率

0.00%

发文量