CGMBL: Combining GAN and Method Name for Bug Localization

2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS) Pub Date : 2022-12-01 DOI:10.1109/QRS57517.2022.00033

Hao Chen, Haiyang Yang, Zilun Yan, Li Kuang, Lingyan Zhang

{"title":"CGMBL: Combining GAN and Method Name for Bug Localization","authors":"Hao Chen, Haiyang Yang, Zilun Yan, Li Kuang, Lingyan Zhang","doi":"10.1109/QRS57517.2022.00033","DOIUrl":null,"url":null,"abstract":"Developers often need to locate buggy code files in the software quality maintenance process. Bug localization aims to automatically identify potentially buggy source code files from the project codes for developers based on the bug reports. Up to now, researchers have proposed many methods to advance this task. However, the early studies only focus on the accuracy of capturing text features or the efficiency of calculating relevance scores, which do not consider the semantic gap between bug reports in natural language and codes in programming language. In this paper, we propose a novel adversarial learning model to bridge the semantic gap. Due to the different characteristics of natural language and programming language, we propose two different representation models for bug reports and code files respectively, and regards the two representation models as the generators. Then we construct adversarial learning by adding a discriminator to distinguish the source of representations so that the model can learn the public features of different texts. In addition, method name is the summary of the code function, and the relevant method name often appears in the bug report. We consider the method name information according to whether the method name appears in the report. Our model can dynamically integrate the information to improve the model effect. We evaluate our model on three open-source java project datasets and compare it with four state-of-the-art methods. The experimental results show that our model outperforms the baseline models and has a significant improvement in evaluation metrics. Besides, we conduct ablation experiments to explain each module’s contribution to the model.","PeriodicalId":143812,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS57517.2022.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Developers often need to locate buggy code files in the software quality maintenance process. Bug localization aims to automatically identify potentially buggy source code files from the project codes for developers based on the bug reports. Up to now, researchers have proposed many methods to advance this task. However, the early studies only focus on the accuracy of capturing text features or the efficiency of calculating relevance scores, which do not consider the semantic gap between bug reports in natural language and codes in programming language. In this paper, we propose a novel adversarial learning model to bridge the semantic gap. Due to the different characteristics of natural language and programming language, we propose two different representation models for bug reports and code files respectively, and regards the two representation models as the generators. Then we construct adversarial learning by adding a discriminator to distinguish the source of representations so that the model can learn the public features of different texts. In addition, method name is the summary of the code function, and the relevant method name often appears in the bug report. We consider the method name information according to whether the method name appears in the report. Our model can dynamically integrate the information to improve the model effect. We evaluate our model on three open-source java project datasets and compare it with four state-of-the-art methods. The experimental results show that our model outperforms the baseline models and has a significant improvement in evaluation metrics. Besides, we conduct ablation experiments to explain each module’s contribution to the model.

查看原文本刊更多论文

结合GAN和方法名进行Bug定位

开发人员经常需要在软件质量维护过程中定位有bug的代码文件。Bug本地化的目的是根据Bug报告为开发人员从项目代码中自动识别潜在的Bug源代码文件。到目前为止，研究人员已经提出了许多方法来推进这项任务。然而，早期的研究只关注捕获文本特征的准确性或计算相关性分数的效率，而没有考虑自然语言中的错误报告与编程语言中的代码之间的语义差距。在本文中，我们提出了一种新的对抗性学习模型来弥合语义差距。由于自然语言和编程语言的不同特点，我们分别提出了bug报告和代码文件的两种不同的表示模型，并将这两种表示模型作为生成器。然后，我们通过添加鉴别器来区分表征的来源，从而构建对抗性学习，使模型能够学习不同文本的公共特征。另外，方法名是代码函数的总结，相关的方法名经常出现在bug报告中。我们根据方法名称是否出现在报告中来考虑方法名称信息。该模型可以动态整合信息，提高模型效果。我们在三个开源java项目数据集上评估我们的模型，并将其与四种最先进的方法进行比较。实验结果表明，我们的模型优于基线模型，并且在评估指标上有显著的改进。此外，我们还进行了烧蚀实验来解释每个模块对模型的贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)

自引率

0.00%

发文量