Test-based patch clustering for automatically-generated patches assessment

IF 3.6 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering Pub Date : 2024-07-24 DOI:10.1007/s10664-024-10503-2

Matias Martinez, Maria Kechagia, Anjana Perera, Justyna Petke, Federica Sarro, Aldeida Aleti

{"title":"Test-based patch clustering for automatically-generated patches assessment","authors":"Matias Martinez, Maria Kechagia, Anjana Perera, Justyna Petke, Federica Sarro, Aldeida Aleti","doi":"10.1007/s10664-024-10503-2","DOIUrl":null,"url":null,"abstract":"Previous studies have shown that Automated Program Repair (apr) techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Therefore, the patches generated by apr tools need to be validated by human programmers, which can be very costly, and prevents apr tool adoption in practice. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior. xTestCluster is applied after the patch generation phase in order to analyze the generated patches from one or more repair tools and to provide more information about those patches for facilitating patch assessment. The novelty of xTestCluster lies in using information from execution of newly generated test cases to cluster patches generated by multiple APR approaches. A cluster is formed of patches that fail on the same generated test cases. The output from xTestCluster gives developers a) a way of reducing the number of patches to analyze, as they can focus on analyzing a sample of patches from each cluster, b) additional information (new test cases and their results) attached to each patch. After analyzing 902 plausible patches from 21 Java apr tools, our results show that xTestCluster is able to reduce the number of patches to review and analyze with a median of 50%. xTestCluster can save a significant amount of time for developers that have to review the multitude of patches generated by apr tools, and provides them with new test cases that expose the differences in behavior between generated patches. Moreover, xTestCluster can complement other patch assessment techniques that help detect patch misclassifications.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"35 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10503-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Previous studies have shown that Automated Program Repair (apr) techniques suffer from the overfitting problem. Overfitting happens when a patch is run and the test suite does not reveal any error, but the patch actually does not fix the underlying bug or it introduces a new defect that is not covered by the test suite. Therefore, the patches generated by apr tools need to be validated by human programmers, which can be very costly, and prevents apr tool adoption in practice. Our work aims to minimize the number of plausible patches that programmers have to review, thereby reducing the time required to find a correct patch. We introduce a novel light-weight test-based patch clustering approach called xTestCluster, which clusters patches based on their dynamic behavior. xTestCluster is applied after the patch generation phase in order to analyze the generated patches from one or more repair tools and to provide more information about those patches for facilitating patch assessment. The novelty of xTestCluster lies in using information from execution of newly generated test cases to cluster patches generated by multiple APR approaches. A cluster is formed of patches that fail on the same generated test cases. The output from xTestCluster gives developers a) a way of reducing the number of patches to analyze, as they can focus on analyzing a sample of patches from each cluster, b) additional information (new test cases and their results) attached to each patch. After analyzing 902 plausible patches from 21 Java apr tools, our results show that xTestCluster is able to reduce the number of patches to review and analyze with a median of 50%. xTestCluster can save a significant amount of time for developers that have to review the multitude of patches generated by apr tools, and provides them with new test cases that expose the differences in behavior between generated patches. Moreover, xTestCluster can complement other patch assessment techniques that help detect patch misclassifications.

Abstract Image

查看原文本刊更多论文

基于测试的补丁聚类，用于自动生成的补丁评估

以往的研究表明，自动程序修复（apr）技术存在过度拟合问题。当补丁运行时，测试套件没有发现任何错误，但补丁实际上并没有修复潜在的错误，或者引入了测试套件没有涵盖的新缺陷时，就会出现过拟合问题。因此，apr 工具生成的补丁需要由人类程序员进行验证，这可能会耗费大量成本，并阻碍apr 工具的实际应用。我们的工作旨在最大限度地减少程序员需要审查的可信补丁的数量，从而减少找到正确补丁所需的时间。我们引入了一种名为 xTestCluster 的新型轻量级基于测试的补丁聚类方法，该方法根据补丁的动态行为对补丁进行聚类。xTestCluster 在补丁生成阶段之后应用，目的是分析由一个或多个修复工具生成的补丁，并提供有关这些补丁的更多信息，以促进补丁评估。xTestCluster 的新颖之处在于利用新生成的测试用例的执行信息，对多种 APR 方法生成的补丁进行聚类。在相同生成的测试用例中失败的补丁组成一个群集。xTestCluster 的输出为开发人员提供了 a) 减少要分析的补丁数量的方法，因为他们可以集中分析每个群组中的补丁样本；b) 附加到每个补丁的额外信息（新测试用例及其结果）。在分析了来自 21 个 Java apr 工具的 902 个似是而非的补丁后，我们的结果表明 xTestCluster 能够将需要审查和分析的补丁数量减少 50%。此外，xTestCluster 还能补充其他补丁评估技术，帮助检测补丁的错误分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.