Hao Chen, Haiyang Yang, Zilun Yan, Li Kuang, Lingyan Zhang
{"title":"CGMBL: Combining GAN and Method Name for Bug Localization","authors":"Hao Chen, Haiyang Yang, Zilun Yan, Li Kuang, Lingyan Zhang","doi":"10.1109/QRS57517.2022.00033","DOIUrl":null,"url":null,"abstract":"Developers often need to locate buggy code files in the software quality maintenance process. Bug localization aims to automatically identify potentially buggy source code files from the project codes for developers based on the bug reports. Up to now, researchers have proposed many methods to advance this task. However, the early studies only focus on the accuracy of capturing text features or the efficiency of calculating relevance scores, which do not consider the semantic gap between bug reports in natural language and codes in programming language. In this paper, we propose a novel adversarial learning model to bridge the semantic gap. Due to the different characteristics of natural language and programming language, we propose two different representation models for bug reports and code files respectively, and regards the two representation models as the generators. Then we construct adversarial learning by adding a discriminator to distinguish the source of representations so that the model can learn the public features of different texts. In addition, method name is the summary of the code function, and the relevant method name often appears in the bug report. We consider the method name information according to whether the method name appears in the report. Our model can dynamically integrate the information to improve the model effect. We evaluate our model on three open-source java project datasets and compare it with four state-of-the-art methods. The experimental results show that our model outperforms the baseline models and has a significant improvement in evaluation metrics. Besides, we conduct ablation experiments to explain each module’s contribution to the model.","PeriodicalId":143812,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS57517.2022.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Developers often need to locate buggy code files in the software quality maintenance process. Bug localization aims to automatically identify potentially buggy source code files from the project codes for developers based on the bug reports. Up to now, researchers have proposed many methods to advance this task. However, the early studies only focus on the accuracy of capturing text features or the efficiency of calculating relevance scores, which do not consider the semantic gap between bug reports in natural language and codes in programming language. In this paper, we propose a novel adversarial learning model to bridge the semantic gap. Due to the different characteristics of natural language and programming language, we propose two different representation models for bug reports and code files respectively, and regards the two representation models as the generators. Then we construct adversarial learning by adding a discriminator to distinguish the source of representations so that the model can learn the public features of different texts. In addition, method name is the summary of the code function, and the relevant method name often appears in the bug report. We consider the method name information according to whether the method name appears in the report. Our model can dynamically integrate the information to improve the model effect. We evaluate our model on three open-source java project datasets and compare it with four state-of-the-art methods. The experimental results show that our model outperforms the baseline models and has a significant improvement in evaluation metrics. Besides, we conduct ablation experiments to explain each module’s contribution to the model.