{"title":"一种基于词嵌入的基于ir的Bug定位自动查询扩展方法","authors":"Misoo Kim, Youngkyoung Kim, Eunseok Lee","doi":"10.1109/ISSRE52982.2021.00038","DOIUrl":null,"url":null,"abstract":"Information retrieval-based bug localization (IRBL) aims at finding buggy files using a bug report as a query. IRBL performance is highly dependent on the query quality. To improve the query quality for IRBL, automatic query expansion (AQE) method has been proposed for identifying query-related terms from the first-retrieved source files. This approach inevitably depends on two determinant of post- retrieval results, the retrieval model and the initial query quality. We propose a novel word embedding-based AQE technique, WEQE, to avoid the heavy dependency of the current AQE approach. Word embedding model enables to fetch terms semantically related to a query by representing words in a vector space. Our method embeds the words from both the global corpus and project-specific-corpus. The initial query is extended by adding words semantically similar to it based on vector representations from our embedding model. We validated the effectiveness of WEQE by using 4,583 bug reports from seven projects, four IRBL models, and two em-bedding models. Our large-scale experimental results show that WEQE can improve the average precision for bug localization for at least 42% of all queries. Our expanded queries on the best IRBL model achieve a 6% higher mean average precision for bug localization than the initial query.","PeriodicalId":162410,"journal":{"name":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Novel Automatic Query Expansion with Word Embedding for IR-based Bug Localization\",\"authors\":\"Misoo Kim, Youngkyoung Kim, Eunseok Lee\",\"doi\":\"10.1109/ISSRE52982.2021.00038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information retrieval-based bug localization (IRBL) aims at finding buggy files using a bug report as a query. IRBL performance is highly dependent on the query quality. To improve the query quality for IRBL, automatic query expansion (AQE) method has been proposed for identifying query-related terms from the first-retrieved source files. This approach inevitably depends on two determinant of post- retrieval results, the retrieval model and the initial query quality. We propose a novel word embedding-based AQE technique, WEQE, to avoid the heavy dependency of the current AQE approach. Word embedding model enables to fetch terms semantically related to a query by representing words in a vector space. Our method embeds the words from both the global corpus and project-specific-corpus. The initial query is extended by adding words semantically similar to it based on vector representations from our embedding model. We validated the effectiveness of WEQE by using 4,583 bug reports from seven projects, four IRBL models, and two em-bedding models. Our large-scale experimental results show that WEQE can improve the average precision for bug localization for at least 42% of all queries. Our expanded queries on the best IRBL model achieve a 6% higher mean average precision for bug localization than the initial query.\",\"PeriodicalId\":162410,\"journal\":{\"name\":\"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSRE52982.2021.00038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE52982.2021.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Automatic Query Expansion with Word Embedding for IR-based Bug Localization
Information retrieval-based bug localization (IRBL) aims at finding buggy files using a bug report as a query. IRBL performance is highly dependent on the query quality. To improve the query quality for IRBL, automatic query expansion (AQE) method has been proposed for identifying query-related terms from the first-retrieved source files. This approach inevitably depends on two determinant of post- retrieval results, the retrieval model and the initial query quality. We propose a novel word embedding-based AQE technique, WEQE, to avoid the heavy dependency of the current AQE approach. Word embedding model enables to fetch terms semantically related to a query by representing words in a vector space. Our method embeds the words from both the global corpus and project-specific-corpus. The initial query is extended by adding words semantically similar to it based on vector representations from our embedding model. We validated the effectiveness of WEQE by using 4,583 bug reports from seven projects, four IRBL models, and two em-bedding models. Our large-scale experimental results show that WEQE can improve the average precision for bug localization for at least 42% of all queries. Our expanded queries on the best IRBL model achieve a 6% higher mean average precision for bug localization than the initial query.