{"title":"Uncertainty Estimation of Automatic Software Debugging in Open-Source Projects Hosting Platform","authors":"Hetong Liang;Shikai Guo;Hui Li;Chenchen Li","doi":"10.1109/TCE.2024.3524511","DOIUrl":null,"url":null,"abstract":"Fault localization and automatic repair of programs are critical tasks in software debugging. A proficient fault localization and automatic repair system can help developers promptly identify and resolve potential issues in various programs, thereby enhancing development and maintenance efficiency. In automatic software debugging, using transfer learning methods to acquire deep semantic features has shown promising results. However, traditional transfer learning methods are susceptible to the noisy data in datasets, which can affect the quality of extracting deep semantic features. To address this limitation, we propose a self-denoising transfer learning model, D-Helper. This model estimates the joint distribution of noisy and true labels to identify and exclude samples whose labels may have been corrupted, thereby mitigating the impact of noisy data on the quality of deep semantic features. The D-Helper consists of three main components: a software debugging knowledge learning component, a fault automatic localization component, and a fault automatic repair component. The software debugging knowledge learning component employs a self-filtering transfer learning method, efficiently acquiring deep semantic knowledge and mitigating the impact of noisy data on the quality of deep semantic features. The fault automatic localization component utilizes acquired deep semantic information for effective fault localization. The fault automatic repair component adopts a template-based repair method, using obtained deep semantic information to generate a reasonable template selection sequence, achieving efficient automatic fault repair. Comprehensive experiments conducted on the widely-recognized Defects4J benchmark demonstrate significant improvements in fault localization scores: Top-1/3/5 and MFR scores of 98, 151, 177, and 76.44, respectively. These represent enhancements of 7.0%, 4.8%, 3.5%, and 4.4% compared to the baseline model. In the repair phase, our results on Defects4J show a 6.4% improvement over the baseline model. Therefore, D-Helper excels in both fault localization and repair tasks, addressing the challenge of noisy data in deep semantic acquisition to enhance model accuracy and robustness.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"905-917"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10819502/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Fault localization and automatic repair of programs are critical tasks in software debugging. A proficient fault localization and automatic repair system can help developers promptly identify and resolve potential issues in various programs, thereby enhancing development and maintenance efficiency. In automatic software debugging, using transfer learning methods to acquire deep semantic features has shown promising results. However, traditional transfer learning methods are susceptible to the noisy data in datasets, which can affect the quality of extracting deep semantic features. To address this limitation, we propose a self-denoising transfer learning model, D-Helper. This model estimates the joint distribution of noisy and true labels to identify and exclude samples whose labels may have been corrupted, thereby mitigating the impact of noisy data on the quality of deep semantic features. The D-Helper consists of three main components: a software debugging knowledge learning component, a fault automatic localization component, and a fault automatic repair component. The software debugging knowledge learning component employs a self-filtering transfer learning method, efficiently acquiring deep semantic knowledge and mitigating the impact of noisy data on the quality of deep semantic features. The fault automatic localization component utilizes acquired deep semantic information for effective fault localization. The fault automatic repair component adopts a template-based repair method, using obtained deep semantic information to generate a reasonable template selection sequence, achieving efficient automatic fault repair. Comprehensive experiments conducted on the widely-recognized Defects4J benchmark demonstrate significant improvements in fault localization scores: Top-1/3/5 and MFR scores of 98, 151, 177, and 76.44, respectively. These represent enhancements of 7.0%, 4.8%, 3.5%, and 4.4% compared to the baseline model. In the repair phase, our results on Defects4J show a 6.4% improvement over the baseline model. Therefore, D-Helper excels in both fault localization and repair tasks, addressing the challenge of noisy data in deep semantic acquisition to enhance model accuracy and robustness.
期刊介绍:
The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.