Context-Aware Code Change Embedding for Better Patch Correctness Assessment

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2022-05-18 DOI:10.1145/3505247

Bo Lin, Shangwen Wang, Ming Wen, Xiaoguang Mao

{"title":"Context-Aware Code Change Embedding for Better Patch Correctness Assessment","authors":"Bo Lin, Shangwen Wang, Ming Wen, Xiaoguang Mao","doi":"10.1145/3505247","DOIUrl":null,"url":null,"abstract":"Despite the capability in successfully fixing more and more real-world bugs, existing Automated Program Repair (APR) techniques are still challenged by the long-standing overfitting problem (i.e., a generated patch that passes all tests is actually incorrect). Plenty of approaches have been proposed for automated patch correctness assessment (APCA). Nonetheless, dynamic ones (i.e., those that needed to execute tests) are time-consuming while static ones (i.e., those built on top of static code features) are less precise. Therefore, embedding techniques have been proposed recently, which assess patch correctness via embedding token sequences extracted from the changed code of a generated patch. However, existing techniques rarely considered the context information and program structures of a generated patch, which are crucial for patch correctness assessment as revealed by existing studies. In this study, we explore the idea of context-aware code change embedding considering program structures for patch correctness assessment. Specifically, given a patch, we not only focus on the changed code but also take the correlated unchanged part into consideration, through which the context information can be extracted and leveraged. We then utilize the AST path technique for representation where the structure information from AST node can be captured. Finally, based on several pre-defined heuristics, we build a deep learning based classifier to predict the correctness of the patch. We implemented this idea as Cache and performed extensive experiments to assess its effectiveness. Our results demonstrate that Cache can (1) perform better than previous representation learning based techniques (e.g., Cache relatively outperforms existing techniques by \\( \\approx \\) 6%, \\( \\approx \\) 3%, and \\( \\approx \\) 16%, respectively under three diverse experiment settings), and (2) achieve overall higher performance than existing APCA techniques while even being more precise than certain dynamic ones including PATCH-SIM (92.9% vs. 83.0%). Further results reveal that the context information and program structures leveraged by Cache contributed significantly to its outstanding performance.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"1 1","pages":"1 - 29"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3505247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

Abstract

Despite the capability in successfully fixing more and more real-world bugs, existing Automated Program Repair (APR) techniques are still challenged by the long-standing overfitting problem (i.e., a generated patch that passes all tests is actually incorrect). Plenty of approaches have been proposed for automated patch correctness assessment (APCA). Nonetheless, dynamic ones (i.e., those that needed to execute tests) are time-consuming while static ones (i.e., those built on top of static code features) are less precise. Therefore, embedding techniques have been proposed recently, which assess patch correctness via embedding token sequences extracted from the changed code of a generated patch. However, existing techniques rarely considered the context information and program structures of a generated patch, which are crucial for patch correctness assessment as revealed by existing studies. In this study, we explore the idea of context-aware code change embedding considering program structures for patch correctness assessment. Specifically, given a patch, we not only focus on the changed code but also take the correlated unchanged part into consideration, through which the context information can be extracted and leveraged. We then utilize the AST path technique for representation where the structure information from AST node can be captured. Finally, based on several pre-defined heuristics, we build a deep learning based classifier to predict the correctness of the patch. We implemented this idea as Cache and performed extensive experiments to assess its effectiveness. Our results demonstrate that Cache can (1) perform better than previous representation learning based techniques (e.g., Cache relatively outperforms existing techniques by \( \approx \) 6%, \( \approx \) 3%, and \( \approx \) 16%, respectively under three diverse experiment settings), and (2) achieve overall higher performance than existing APCA techniques while even being more precise than certain dynamic ones including PATCH-SIM (92.9% vs. 83.0%). Further results reveal that the context information and program structures leveraged by Cache contributed significantly to its outstanding performance.

查看原文本刊更多论文

上下文感知代码更改嵌入，以更好地进行补丁正确性评估

尽管能够成功地修复越来越多的现实世界中的错误，现有的自动化程序修复(APR)技术仍然受到长期存在的过拟合问题的挑战(即，生成的通过所有测试的补丁实际上是不正确的)。已经提出了许多用于自动补丁正确性评估(APCA)的方法。尽管如此，动态代码(例如，那些需要执行测试的代码)非常耗时，而静态代码(例如，那些构建在静态代码特性之上的代码)则不那么精确。因此，最近提出了嵌入技术，该技术通过嵌入从生成补丁的更改代码中提取的令牌序列来评估补丁的正确性。然而，现有技术很少考虑所生成补丁的上下文信息和程序结构，而已有研究表明，这对补丁正确性评估至关重要。在这项研究中，我们探讨了上下文感知代码更改嵌入的想法，考虑了补丁正确性评估的程序结构。具体来说，给定一个补丁，我们不仅关注改变了的代码，还考虑了相关的未改变的部分，通过它可以提取和利用上下文信息。然后，我们利用AST路径技术进行表示，其中可以捕获来自AST节点的结构信息。最后，基于几个预定义的启发式算法，我们构建了一个基于深度学习的分类器来预测补丁的正确性。我们将这个想法作为缓存实现，并进行了大量的实验来评估其有效性。我们的结果表明，Cache可以(1)比以前基于表示学习的技术表现得更好(例如，Cache相对优于现有技术\( \approx \) 6)%, \( \approx \) 3%, and \( \approx \) 16%, respectively under three diverse experiment settings), and (2) achieve overall higher performance than existing APCA techniques while even being more precise than certain dynamic ones including PATCH-SIM (92.9% vs. 83.0%). Further results reveal that the context information and program structures leveraged by Cache contributed significantly to its outstanding performance.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量