CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E)

Fei Lv, Hongyu Zhang, Jian-Guang Lou, Shaowei Wang, D. Zhang, Jianjun Zhao
{"title":"CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E)","authors":"Fei Lv, Hongyu Zhang, Jian-Guang Lou, Shaowei Wang, D. Zhang, Jianjun Zhao","doi":"10.1109/ASE.2015.42","DOIUrl":null,"url":null,"abstract":"Over the years of software development, a vast amount of source code has been accumulated. Many code search tools were proposed to help programmers reuse previously-written code by performing free-text queries over a large-scale codebase. Our experience shows that the accuracy of these code search tools are often unsatisfactory. One major reason is that existing tools lack of query understanding ability. In this paper, we propose CodeHow, a code search technique that can recognize potential APIs a user query refers to. Having understood the potentially relevant APIs, CodeHow expands the query with the APIs and performs code retrieval by applying the Extended Boolean model, which considers the impact of both text similarity and potential APIs on code search. We deploy the backend of CodeHow as a Microsoft Azure service and implement the front-end as a Visual Studio extension. We evaluate CodeHow on a large-scale codebase consisting of 26K C# projects downloaded from GitHub. The experimental results show that when the top 1 results are inspected, CodeHow achieves a precision score of 0.794 (i.e., 79.4% of the first returned results are relevant code snippets). The results also show that CodeHow outperforms conventional code search tools. Furthermore, we perform a controlled experiment and a survey of Microsoft developers. The results confirm the usefulness and effectiveness of CodeHow in programming practices.","PeriodicalId":6586,"journal":{"name":"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"19 1","pages":"260-270"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"218","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2015.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 218

Abstract

Over the years of software development, a vast amount of source code has been accumulated. Many code search tools were proposed to help programmers reuse previously-written code by performing free-text queries over a large-scale codebase. Our experience shows that the accuracy of these code search tools are often unsatisfactory. One major reason is that existing tools lack of query understanding ability. In this paper, we propose CodeHow, a code search technique that can recognize potential APIs a user query refers to. Having understood the potentially relevant APIs, CodeHow expands the query with the APIs and performs code retrieval by applying the Extended Boolean model, which considers the impact of both text similarity and potential APIs on code search. We deploy the backend of CodeHow as a Microsoft Azure service and implement the front-end as a Visual Studio extension. We evaluate CodeHow on a large-scale codebase consisting of 26K C# projects downloaded from GitHub. The experimental results show that when the top 1 results are inspected, CodeHow achieves a precision score of 0.794 (i.e., 79.4% of the first returned results are relevant code snippets). The results also show that CodeHow outperforms conventional code search tools. Furthermore, we perform a controlled experiment and a survey of Microsoft developers. The results confirm the usefulness and effectiveness of CodeHow in programming practices.
CodeHow:基于API理解和扩展布尔模型的有效代码搜索(E)
经过多年的软件开发,已经积累了大量的源代码。许多代码搜索工具被提出,通过在大规模代码库上执行自由文本查询来帮助程序员重用以前编写的代码。我们的经验表明,这些代码搜索工具的准确性往往不令人满意。一个主要原因是现有工具缺乏查询理解能力。在本文中,我们提出了CodeHow,这是一种代码搜索技术,可以识别用户查询所引用的潜在api。在理解了潜在的相关api之后,CodeHow使用api扩展查询,并通过应用Extended Boolean模型执行代码检索,该模型考虑了文本相似性和潜在api对代码搜索的影响。我们将CodeHow的后端部署为Microsoft Azure服务,并将前端实现为Visual Studio扩展。我们在一个由从GitHub下载的26K c#项目组成的大规模代码库上评估CodeHow。实验结果表明,当检查前1个结果时,CodeHow达到了0.794的精度分数(即79.4%的第一个返回结果是相关的代码片段)。结果还表明,CodeHow优于传统的代码搜索工具。此外,我们进行了一个控制实验和微软开发人员的调查。结果证实了CodeHow在编程实践中的有用性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信