Meili Lu, Xiaobing Sun, Shaowei Wang, D. Lo, Yucong Duan
{"title":"Query expansion via WordNet for effective code search","authors":"Meili Lu, Xiaobing Sun, Shaowei Wang, D. Lo, Yucong Duan","doi":"10.1109/SANER.2015.7081874","DOIUrl":null,"url":null,"abstract":"Source code search plays an important role in software maintenance. The effectiveness of source code search not only relies on the search technique, but also on the quality of the query. In practice, software systems are large, thus it is difficult for a developer to format an accurate query to express what really in her/his mind, especially when the maintainer and the original developer are not the same person. When a query performs poorly, it has to be reformulated. But the words used in a query may be different from those that have similar semantics in the source code, i.e., the synonyms, which will affect the accuracy of code search results. To address this issue, we propose an approach that extends a query with synonyms generated from WordNet. Our approach extracts natural language phrases from source code identifiers, matches expanded queries with these phrases, and sorts the search results. It allows developers to explore word usage in a piece of software, helps them quickly identify relevant program elements for investigation or quickly recognize alternative words for query reformulation. Our initial empirical study on search tasks performed on the JavaScript/ECMAScript interpreter and compiler, Rhino, shows that the synonyms used to expand the queries help recommend good alternative queries. Our approach also improves the precision and recall of Conquer, a state-of-the-art query expansion/reformulation technique, by 5% and 8% respectively.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"148","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2015.7081874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 148
Abstract
Source code search plays an important role in software maintenance. The effectiveness of source code search not only relies on the search technique, but also on the quality of the query. In practice, software systems are large, thus it is difficult for a developer to format an accurate query to express what really in her/his mind, especially when the maintainer and the original developer are not the same person. When a query performs poorly, it has to be reformulated. But the words used in a query may be different from those that have similar semantics in the source code, i.e., the synonyms, which will affect the accuracy of code search results. To address this issue, we propose an approach that extends a query with synonyms generated from WordNet. Our approach extracts natural language phrases from source code identifiers, matches expanded queries with these phrases, and sorts the search results. It allows developers to explore word usage in a piece of software, helps them quickly identify relevant program elements for investigation or quickly recognize alternative words for query reformulation. Our initial empirical study on search tasks performed on the JavaScript/ECMAScript interpreter and compiler, Rhino, shows that the synonyms used to expand the queries help recommend good alternative queries. Our approach also improves the precision and recall of Conquer, a state-of-the-art query expansion/reformulation technique, by 5% and 8% respectively.