Uncertainty Estimation of Automatic Software Debugging in Open-Source Projects Hosting Platform

IF 4.3 2区 计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Hetong Liang;Shikai Guo;Hui Li;Chenchen Li
{"title":"Uncertainty Estimation of Automatic Software Debugging in Open-Source Projects Hosting Platform","authors":"Hetong Liang;Shikai Guo;Hui Li;Chenchen Li","doi":"10.1109/TCE.2024.3524511","DOIUrl":null,"url":null,"abstract":"Fault localization and automatic repair of programs are critical tasks in software debugging. A proficient fault localization and automatic repair system can help developers promptly identify and resolve potential issues in various programs, thereby enhancing development and maintenance efficiency. In automatic software debugging, using transfer learning methods to acquire deep semantic features has shown promising results. However, traditional transfer learning methods are susceptible to the noisy data in datasets, which can affect the quality of extracting deep semantic features. To address this limitation, we propose a self-denoising transfer learning model, D-Helper. This model estimates the joint distribution of noisy and true labels to identify and exclude samples whose labels may have been corrupted, thereby mitigating the impact of noisy data on the quality of deep semantic features. The D-Helper consists of three main components: a software debugging knowledge learning component, a fault automatic localization component, and a fault automatic repair component. The software debugging knowledge learning component employs a self-filtering transfer learning method, efficiently acquiring deep semantic knowledge and mitigating the impact of noisy data on the quality of deep semantic features. The fault automatic localization component utilizes acquired deep semantic information for effective fault localization. The fault automatic repair component adopts a template-based repair method, using obtained deep semantic information to generate a reasonable template selection sequence, achieving efficient automatic fault repair. Comprehensive experiments conducted on the widely-recognized Defects4J benchmark demonstrate significant improvements in fault localization scores: Top-1/3/5 and MFR scores of 98, 151, 177, and 76.44, respectively. These represent enhancements of 7.0%, 4.8%, 3.5%, and 4.4% compared to the baseline model. In the repair phase, our results on Defects4J show a 6.4% improvement over the baseline model. Therefore, D-Helper excels in both fault localization and repair tasks, addressing the challenge of noisy data in deep semantic acquisition to enhance model accuracy and robustness.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"905-917"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10819502/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Fault localization and automatic repair of programs are critical tasks in software debugging. A proficient fault localization and automatic repair system can help developers promptly identify and resolve potential issues in various programs, thereby enhancing development and maintenance efficiency. In automatic software debugging, using transfer learning methods to acquire deep semantic features has shown promising results. However, traditional transfer learning methods are susceptible to the noisy data in datasets, which can affect the quality of extracting deep semantic features. To address this limitation, we propose a self-denoising transfer learning model, D-Helper. This model estimates the joint distribution of noisy and true labels to identify and exclude samples whose labels may have been corrupted, thereby mitigating the impact of noisy data on the quality of deep semantic features. The D-Helper consists of three main components: a software debugging knowledge learning component, a fault automatic localization component, and a fault automatic repair component. The software debugging knowledge learning component employs a self-filtering transfer learning method, efficiently acquiring deep semantic knowledge and mitigating the impact of noisy data on the quality of deep semantic features. The fault automatic localization component utilizes acquired deep semantic information for effective fault localization. The fault automatic repair component adopts a template-based repair method, using obtained deep semantic information to generate a reasonable template selection sequence, achieving efficient automatic fault repair. Comprehensive experiments conducted on the widely-recognized Defects4J benchmark demonstrate significant improvements in fault localization scores: Top-1/3/5 and MFR scores of 98, 151, 177, and 76.44, respectively. These represent enhancements of 7.0%, 4.8%, 3.5%, and 4.4% compared to the baseline model. In the repair phase, our results on Defects4J show a 6.4% improvement over the baseline model. Therefore, D-Helper excels in both fault localization and repair tasks, addressing the challenge of noisy data in deep semantic acquisition to enhance model accuracy and robustness.
开源项目托管平台软件自动调试的不确定性估计
程序的故障定位和自动修复是软件调试中的关键任务。一个熟练的故障定位和自动修复系统可以帮助开发人员及时发现和解决各种程序中的潜在问题,从而提高开发和维护效率。在软件自动调试中,利用迁移学习方法获取深层语义特征已显示出良好的效果。然而,传统的迁移学习方法容易受到数据集中数据噪声的影响,从而影响深度语义特征的提取质量。为了解决这一限制,我们提出了一种自去噪迁移学习模型D-Helper。该模型估计有噪声和真实标签的联合分布,以识别和排除标签可能已经损坏的样本,从而减轻有噪声数据对深度语义特征质量的影响。D-Helper主要由软件调试知识学习组件、故障自动定位组件和故障自动修复组件组成。软件调试知识学习组件采用自过滤迁移学习方法,高效获取深度语义知识,减轻了噪声数据对深度语义特征质量的影响。故障自动定位组件利用获取的深度语义信息进行有效的故障定位。故障自动修复组件采用基于模板的修复方法,利用获得的深层语义信息生成合理的模板选择序列,实现高效的故障自动修复。在广泛认可的缺陷4j基准上进行的综合实验表明,故障定位得分有显著提高:Top-1/3/5和MFR得分分别为98、151、177和76.44。与基线模型相比,这些模型分别增强了7.0%、4.8%、3.5%和4.4%。在修复阶段,我们对缺陷4j的结果显示比基线模型有6.4%的改进。因此,D-Helper在故障定位和修复任务方面都表现出色,解决了深度语义获取中噪声数据的挑战,提高了模型的准确性和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.70
自引率
9.30%
发文量
59
审稿时长
3.3 months
期刊介绍: The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信