Structuring Semantic-Aware Relations Between Bugs and Patches for Accurate Patch Evaluation

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Software-Evolution and Process Pub Date : 2025-02-02 DOI:10.1002/smr.70001

Lingxiao Zhao, Hui Li, Yongqian Chen, Xiaowei Pan, Shikai Guo

{"title":"Structuring Semantic-Aware Relations Between Bugs and Patches for Accurate Patch Evaluation","authors":"Lingxiao Zhao, Hui Li, Yongqian Chen, Xiaowei Pan, Shikai Guo","doi":"10.1002/smr.70001","DOIUrl":null,"url":null,"abstract":"<div>\n \n Patches can help fix security vulnerabilities and optimize software performance, thereby enhancing the quality and security of the software. Unfortunately, patches generated by automated program repair tools are not always correct, as they may introduce new bugs or fail to fully rectify the original issue. Various methods for evaluating patch correctness have been proposed. However, most methods face the challenge of capturing long-distance dependencies in patch correctness evaluation, which leads to a decline in the predictive performance of the models. To address the challenge, this paper presents a method named Qamhaen to evaluate the correctness of patches generated by APR. Specifically, text embedding of bugs and patches component address the challenge of long-distance dependencies across functions in patch correctness evaluation by using bug reports and patch descriptions as inputs instead of code snippets. BERT is employed for pretraining to capture these dependencies, followed by an additional multihead self-attention mechanism for further feature extraction. Similarity evaluator component devises a similarity calculation to assess the effectiveness of patch descriptions in resolving issues outlined in bug reports. Comprehensive experiments are conducted on a dataset containing 9135 patches and a patch correctness assessment metric, and extensive experiments demonstrate that Qamhaen outperforms baseline methods in terms of overall performance across AUC, F1, +Recall, -Recall, and Precision. For example, compared to the baseline, Qamhaen achieves an F1 of 0.691, representing improvements of 24.2%, 22.1%, and 6.3% over the baseline methods, respectively.\n </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 2","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.70001","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Patches can help fix security vulnerabilities and optimize software performance, thereby enhancing the quality and security of the software. Unfortunately, patches generated by automated program repair tools are not always correct, as they may introduce new bugs or fail to fully rectify the original issue. Various methods for evaluating patch correctness have been proposed. However, most methods face the challenge of capturing long-distance dependencies in patch correctness evaluation, which leads to a decline in the predictive performance of the models. To address the challenge, this paper presents a method named Qamhaen to evaluate the correctness of patches generated by APR. Specifically, text embedding of bugs and patches component address the challenge of long-distance dependencies across functions in patch correctness evaluation by using bug reports and patch descriptions as inputs instead of code snippets. BERT is employed for pretraining to capture these dependencies, followed by an additional multihead self-attention mechanism for further feature extraction. Similarity evaluator component devises a similarity calculation to assess the effectiveness of patch descriptions in resolving issues outlined in bug reports. Comprehensive experiments are conducted on a dataset containing 9135 patches and a patch correctness assessment metric, and extensive experiments demonstrate that Qamhaen outperforms baseline methods in terms of overall performance across AUC, F1, +Recall, -Recall, and Precision. For example, compared to the baseline, Qamhaen achieves an F1 of 0.691, representing improvements of 24.2%, 22.1%, and 6.3% over the baseline methods, respectively.

Abstract Image

查看原文本刊更多论文

构建bug和补丁之间的语义感知关系，以实现准确的补丁评估

补丁可以修复安全漏洞，优化软件性能，从而提高软件的质量和安全性。不幸的是，自动程序修复工具生成的补丁并不总是正确的，因为它们可能会引入新的错误或无法完全纠正原始问题。已经提出了各种评估补丁正确性的方法。然而，大多数方法在补丁正确性评估中都面临着捕获远程依赖关系的挑战，这导致了模型预测性能的下降。为了解决这一问题，本文提出了一种名为Qamhaen的方法来评估apr生成的补丁的正确性。其中，bug和补丁组件的文本嵌入通过使用bug报告和补丁描述作为输入而不是代码片段，解决了补丁正确性评估中功能之间的长距离依赖。采用BERT进行预训练以捕获这些依赖关系，然后采用额外的多头自注意机制进行进一步的特征提取。相似度评估器组件设计了相似度计算，以评估补丁描述在解决bug报告中概述的问题方面的有效性。在包含9135个补丁和补丁准确性评估指标的数据集上进行了全面的实验，大量的实验表明，Qamhaen在AUC、F1、+Recall、-Recall和Precision的整体性能方面优于基线方法。例如，与基线方法相比，Qamhaen实现了0.691的F1，分别比基线方法提高了24.2%、22.1%和6.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Software-Evolution and Process COMPUTER SCIENCE, SOFTWARE ENGINEERING-

自引率

10.00%

发文量

109