Uncertainty Estimation of Automatic Software Debugging in Open-Source Projects Hosting Platform

IF 4.3 2区计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Consumer Electronics Pub Date : 2025-01-01 DOI:10.1109/TCE.2024.3524511

Hetong Liang;Shikai Guo;Hui Li;Chenchen Li

{"title":"Uncertainty Estimation of Automatic Software Debugging in Open-Source Projects Hosting Platform","authors":"Hetong Liang;Shikai Guo;Hui Li;Chenchen Li","doi":"10.1109/TCE.2024.3524511","DOIUrl":null,"url":null,"abstract":"Fault localization and automatic repair of programs are critical tasks in software debugging. A proficient fault localization and automatic repair system can help developers promptly identify and resolve potential issues in various programs, thereby enhancing development and maintenance efficiency. In automatic software debugging, using transfer learning methods to acquire deep semantic features has shown promising results. However, traditional transfer learning methods are susceptible to the noisy data in datasets, which can affect the quality of extracting deep semantic features. To address this limitation, we propose a self-denoising transfer learning model, D-Helper. This model estimates the joint distribution of noisy and true labels to identify and exclude samples whose labels may have been corrupted, thereby mitigating the impact of noisy data on the quality of deep semantic features. The D-Helper consists of three main components: a software debugging knowledge learning component, a fault automatic localization component, and a fault automatic repair component. The software debugging knowledge learning component employs a self-filtering transfer learning method, efficiently acquiring deep semantic knowledge and mitigating the impact of noisy data on the quality of deep semantic features. The fault automatic localization component utilizes acquired deep semantic information for effective fault localization. The fault automatic repair component adopts a template-based repair method, using obtained deep semantic information to generate a reasonable template selection sequence, achieving efficient automatic fault repair. Comprehensive experiments conducted on the widely-recognized Defects4J benchmark demonstrate significant improvements in fault localization scores: Top-1/3/5 and MFR scores of 98, 151, 177, and 76.44, respectively. These represent enhancements of 7.0%, 4.8%, 3.5%, and 4.4% compared to the baseline model. In the repair phase, our results on Defects4J show a 6.4% improvement over the baseline model. Therefore, D-Helper excels in both fault localization and repair tasks, addressing the challenge of noisy data in deep semantic acquisition to enhance model accuracy and robustness.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"905-917"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10819502/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Fault localization and automatic repair of programs are critical tasks in software debugging. A proficient fault localization and automatic repair system can help developers promptly identify and resolve potential issues in various programs, thereby enhancing development and maintenance efficiency. In automatic software debugging, using transfer learning methods to acquire deep semantic features has shown promising results. However, traditional transfer learning methods are susceptible to the noisy data in datasets, which can affect the quality of extracting deep semantic features. To address this limitation, we propose a self-denoising transfer learning model, D-Helper. This model estimates the joint distribution of noisy and true labels to identify and exclude samples whose labels may have been corrupted, thereby mitigating the impact of noisy data on the quality of deep semantic features. The D-Helper consists of three main components: a software debugging knowledge learning component, a fault automatic localization component, and a fault automatic repair component. The software debugging knowledge learning component employs a self-filtering transfer learning method, efficiently acquiring deep semantic knowledge and mitigating the impact of noisy data on the quality of deep semantic features. The fault automatic localization component utilizes acquired deep semantic information for effective fault localization. The fault automatic repair component adopts a template-based repair method, using obtained deep semantic information to generate a reasonable template selection sequence, achieving efficient automatic fault repair. Comprehensive experiments conducted on the widely-recognized Defects4J benchmark demonstrate significant improvements in fault localization scores: Top-1/3/5 and MFR scores of 98, 151, 177, and 76.44, respectively. These represent enhancements of 7.0%, 4.8%, 3.5%, and 4.4% compared to the baseline model. In the repair phase, our results on Defects4J show a 6.4% improvement over the baseline model. Therefore, D-Helper excels in both fault localization and repair tasks, addressing the challenge of noisy data in deep semantic acquisition to enhance model accuracy and robustness.

查看原文本刊更多论文

开源项目托管平台软件自动调试的不确定性估计

程序的故障定位和自动修复是软件调试中的关键任务。一个熟练的故障定位和自动修复系统可以帮助开发人员及时发现和解决各种程序中的潜在问题，从而提高开发和维护效率。在软件自动调试中，利用迁移学习方法获取深层语义特征已显示出良好的效果。然而，传统的迁移学习方法容易受到数据集中数据噪声的影响，从而影响深度语义特征的提取质量。为了解决这一限制，我们提出了一种自去噪迁移学习模型D-Helper。该模型估计有噪声和真实标签的联合分布，以识别和排除标签可能已经损坏的样本，从而减轻有噪声数据对深度语义特征质量的影响。D-Helper主要由软件调试知识学习组件、故障自动定位组件和故障自动修复组件组成。软件调试知识学习组件采用自过滤迁移学习方法，高效获取深度语义知识，减轻了噪声数据对深度语义特征质量的影响。故障自动定位组件利用获取的深度语义信息进行有效的故障定位。故障自动修复组件采用基于模板的修复方法，利用获得的深层语义信息生成合理的模板选择序列，实现高效的故障自动修复。在广泛认可的缺陷4j基准上进行的综合实验表明，故障定位得分有显著提高：Top-1/3/5和MFR得分分别为98、151、177和76.44。与基线模型相比，这些模型分别增强了7.0%、4.8%、3.5%和4.4%。在修复阶段，我们对缺陷4j的结果显示比基线模型有6.4%的改进。因此，D-Helper在故障定位和修复任务方面都表现出色，解决了深度语义获取中噪声数据的挑战，提高了模型的准确性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Consumer Electronics 工程技术-电信学

CiteScore

7.70

自引率

9.30%

发文量

审稿时长

3.3 months

期刊介绍： The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.