利用深度语义特征和迁移知识改进故障定位和程序修复

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) Pub Date : 2022-05-01 DOI:10.1145/3510003.3510147

Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu

{"title":"利用深度语义特征和迁移知识改进故障定位和程序修复","authors":"Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu","doi":"10.1145/3510003.3510147","DOIUrl":null,"url":null,"abstract":"Automatic software debugging mainly includes two tasks of fault lo-calization and automated program repair. Compared with the traditional spectrum-based and mutation-based methods, deep learning-based methods are proposed to achieve better performance for fault localization. However, the existing methods ignore the deep seman-tic features or only consider simple code representations. They do not leverage the existing bug-related knowledge from large-scale open-source projects either. In addition, existing template-based program repair techniques can incorporate project specific information better than deep-learning approaches. However, they are weak in selecting the fix templates for efficient program repair. In this work, we propose a novel approach called TRANSFER, which lever-ages the deep semantic features and transferred knowledge from open-source data to improve fault localization and program repair. First, we build two large-scale open-source bug datasets and design 11 BiLSTM-based binary classifiers and a BiLSTM-based multi-classifier to learn deep semantic features of statements for fault localization and program repair, respectively. Second, we combine semantic-based, spectrum-based and mutation-based features and use an MLP-based model for fault localization. Third, the semantic-based features are leveraged to rank the fix templates for program repair. Our extensive experiments on widely-used benchmark De-fects4J show that TRANSFER outperforms all baselines in fault localization, and is better than existing deep-learning methods in automated program repair. Compared with the typical template-based work TBar, TRANSFER can correctly repair 6 more bugs (47 in total) on Defects4J.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge\",\"authors\":\"Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, Xudong Liu\",\"doi\":\"10.1145/3510003.3510147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic software debugging mainly includes two tasks of fault lo-calization and automated program repair. Compared with the traditional spectrum-based and mutation-based methods, deep learning-based methods are proposed to achieve better performance for fault localization. However, the existing methods ignore the deep seman-tic features or only consider simple code representations. They do not leverage the existing bug-related knowledge from large-scale open-source projects either. In addition, existing template-based program repair techniques can incorporate project specific information better than deep-learning approaches. However, they are weak in selecting the fix templates for efficient program repair. In this work, we propose a novel approach called TRANSFER, which lever-ages the deep semantic features and transferred knowledge from open-source data to improve fault localization and program repair. First, we build two large-scale open-source bug datasets and design 11 BiLSTM-based binary classifiers and a BiLSTM-based multi-classifier to learn deep semantic features of statements for fault localization and program repair, respectively. Second, we combine semantic-based, spectrum-based and mutation-based features and use an MLP-based model for fault localization. Third, the semantic-based features are leveraged to rank the fix templates for program repair. Our extensive experiments on widely-used benchmark De-fects4J show that TRANSFER outperforms all baselines in fault localization, and is better than existing deep-learning methods in automated program repair. Compared with the typical template-based work TBar, TRANSFER can correctly repair 6 more bugs (47 in total) on Defects4J.\",\"PeriodicalId\":202896,\"journal\":{\"name\":\"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3510003.3510147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

软件自动调试主要包括故障定位和程序自动修复两项任务。与传统的基于谱和基于突变的故障定位方法相比，提出了基于深度学习的故障定位方法。然而，现有的方法忽略了深层语义特征或只考虑简单的代码表示。它们也没有利用来自大型开源项目的现有bug相关知识。此外，现有的基于模板的程序修复技术可以比深度学习方法更好地整合项目特定信息。然而，它们在选择修复模板以进行有效的程序修复方面很薄弱。在这项工作中，我们提出了一种名为TRANSFER的新方法，该方法利用开源数据中的深层语义特征和迁移知识来改进故障定位和程序修复。首先，我们构建了两个大规模的开源bug数据集，设计了11个基于bilstm的二进制分类器和1个基于bilstm的多分类器，分别学习语句的深度语义特征，用于故障定位和程序修复。其次，结合基于语义、基于频谱和基于突变的特征，采用基于mlp的故障定位模型;第三，利用基于语义的特性对程序修复的修复模板进行排序。我们在广泛使用的基准De-fects4J上进行的大量实验表明，TRANSFER在故障定位方面优于所有基线，并且在自动程序修复方面优于现有的深度学习方法。与典型的基于模板的工作TBar相比，TRANSFER可以正确地修复缺陷4j上的6个错误(总共47个)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge

Automatic software debugging mainly includes two tasks of fault lo-calization and automated program repair. Compared with the traditional spectrum-based and mutation-based methods, deep learning-based methods are proposed to achieve better performance for fault localization. However, the existing methods ignore the deep seman-tic features or only consider simple code representations. They do not leverage the existing bug-related knowledge from large-scale open-source projects either. In addition, existing template-based program repair techniques can incorporate project specific information better than deep-learning approaches. However, they are weak in selecting the fix templates for efficient program repair. In this work, we propose a novel approach called TRANSFER, which lever-ages the deep semantic features and transferred knowledge from open-source data to improve fault localization and program repair. First, we build two large-scale open-source bug datasets and design 11 BiLSTM-based binary classifiers and a BiLSTM-based multi-classifier to learn deep semantic features of statements for fault localization and program repair, respectively. Second, we combine semantic-based, spectrum-based and mutation-based features and use an MLP-based model for fault localization. Third, the semantic-based features are leveraged to rank the fix templates for program repair. Our extensive experiments on widely-used benchmark De-fects4J show that TRANSFER outperforms all baselines in fault localization, and is better than existing deep-learning methods in automated program repair. Compared with the typical template-based work TBar, TRANSFER can correctly repair 6 more bugs (47 in total) on Defects4J.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量