通过使用反例为动态APR提供更可靠的测试套件

2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) Pub Date : 2021-10-01 DOI:10.1109/ISSRE52982.2021.00032

Amirfarhad Nilizadeh, Marlon Calvo, Gary T. Leavens, X. Le

{"title":"通过使用反例为动态APR提供更可靠的测试套件","authors":"Amirfarhad Nilizadeh, Marlon Calvo, Gary T. Leavens, X. Le","doi":"10.1109/ISSRE52982.2021.00032","DOIUrl":null,"url":null,"abstract":"Dynamic automated program repair (APR) techniques, which use test suites for bug localization and evaluating candidate patches, have promising results. However, many studies show that machine-generated patches with dynamic APR tools are not always reliable. Recent studies show that enhancing test suites by adding tests will help dynamic APR tools generate more reliable patches. We evaluate the effectiveness of minimally enhancing test suites by adding counterexamples for repaired programs that suffer from test overfitting. We use formal methods as an independent standard for evaluating patches' correctness and for generating counterexamples. Techniques for evaluating patch correctness (both with human reviewers and formal methods) can create false negatives, meaning that the repaired program is correct but is deemed incorrect. A counterexample is a good way to check on reviewer decisions about correctness. Our study evaluated 256 repaired but not verified programs (from the buggy Java+JML dataset); the repairs were generated by seven state-of-the-art dynamic APR tools. Our results show that the counterexample generated by the OpenJML tool could correctly classify all these programs into the categories of “test overfitting” and “false negatives.” After adding tests based on the counterexamples to the test suites, we ran the APR tools on the original buggy programs again and found that: (1) the APR tools were able to generate about 27.3% more correct patches with the enhanced test suite, and (2) the enhanced test suite resulted in the APR tools generating about 83.6% fewer overfitted patches.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"More Reliable Test Suites for Dynamic APR by using Counterexamples\",\"authors\":\"Amirfarhad Nilizadeh, Marlon Calvo, Gary T. Leavens, X. Le\",\"doi\":\"10.1109/ISSRE52982.2021.00032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic automated program repair (APR) techniques, which use test suites for bug localization and evaluating candidate patches, have promising results. However, many studies show that machine-generated patches with dynamic APR tools are not always reliable. Recent studies show that enhancing test suites by adding tests will help dynamic APR tools generate more reliable patches. We evaluate the effectiveness of minimally enhancing test suites by adding counterexamples for repaired programs that suffer from test overfitting. We use formal methods as an independent standard for evaluating patches' correctness and for generating counterexamples. Techniques for evaluating patch correctness (both with human reviewers and formal methods) can create false negatives, meaning that the repaired program is correct but is deemed incorrect. A counterexample is a good way to check on reviewer decisions about correctness. Our study evaluated 256 repaired but not verified programs (from the buggy Java+JML dataset); the repairs were generated by seven state-of-the-art dynamic APR tools. Our results show that the counterexample generated by the OpenJML tool could correctly classify all these programs into the categories of “test overfitting” and “false negatives.” After adding tests based on the counterexamples to the test suites, we ran the APR tools on the original buggy programs again and found that: (1) the APR tools were able to generate about 27.3% more correct patches with the enhanced test suite, and (2) the enhanced test suite resulted in the APR tools generating about 83.6% fewer overfitted patches.\",\"PeriodicalId\":162410,\"journal\":{\"name\":\"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)\",\"volume\":\"163 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSRE52982.2021.00032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE52982.2021.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

动态自动程序修复(APR)技术使用测试套件进行错误定位和评估候选补丁，具有良好的效果。然而，许多研究表明，使用动态APR工具的机器生成的补丁并不总是可靠的。最近的研究表明，通过添加测试来增强测试套件将有助于动态APR工具生成更可靠的补丁。我们通过添加受测试过拟合影响的修复程序的反例来评估最小增强测试套件的有效性。我们使用形式化方法作为评估补丁正确性和生成反例的独立标准。评估补丁正确性的技术(包括人工审查员和正式方法)可能会产生假阴性，这意味着修复的程序是正确的，但被认为是不正确的。反例是检查审稿人关于正确性的决定的好方法。我们的研究评估了256个修复但未验证的程序(来自有缺陷的Java+JML数据集);修复工作由七个最先进的动态APR工具完成。我们的结果表明，由OpenJML工具生成的反例可以正确地将所有这些程序分类为“测试过拟合”和“假阴性”。在测试套件中添加基于反例的测试后，我们再次在原始错误程序上运行APR工具，发现:(1)APR工具能够使用增强的测试套件生成约27.3%的正确补丁，(2)增强的测试套件使APR工具生成约83.6%的过拟合补丁。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

More Reliable Test Suites for Dynamic APR by using Counterexamples

Dynamic automated program repair (APR) techniques, which use test suites for bug localization and evaluating candidate patches, have promising results. However, many studies show that machine-generated patches with dynamic APR tools are not always reliable. Recent studies show that enhancing test suites by adding tests will help dynamic APR tools generate more reliable patches. We evaluate the effectiveness of minimally enhancing test suites by adding counterexamples for repaired programs that suffer from test overfitting. We use formal methods as an independent standard for evaluating patches' correctness and for generating counterexamples. Techniques for evaluating patch correctness (both with human reviewers and formal methods) can create false negatives, meaning that the repaired program is correct but is deemed incorrect. A counterexample is a good way to check on reviewer decisions about correctness. Our study evaluated 256 repaired but not verified programs (from the buggy Java+JML dataset); the repairs were generated by seven state-of-the-art dynamic APR tools. Our results show that the counterexample generated by the OpenJML tool could correctly classify all these programs into the categories of “test overfitting” and “false negatives.” After adding tests based on the counterexamples to the test suites, we ran the APR tools on the original buggy programs again and found that: (1) the APR tools were able to generate about 27.3% more correct patches with the enhanced test suite, and (2) the enhanced test suite resulted in the APR tools generating about 83.6% fewer overfitted patches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)

自引率

0.00%

发文量