Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-03-22 DOI:10.1145/3196398.3196425

Md Masudur Rahman, J. Barson, Sydney Paul, Joshua Kayan, F. Lois, S. Quezada, Chris Parnin, Kathryn T. Stolee, Baishakhi Ray

{"title":"Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval","authors":"Md Masudur Rahman, J. Barson, Sydney Paul, Joshua Kayan, F. Lois, S. Quezada, Chris Parnin, Kathryn T. Stolee, Baishakhi Ray","doi":"10.1145/3196398.3196425","DOIUrl":null,"url":null,"abstract":"Search is an integral part of a software development process. Developers often use search engines to look for information during development, including reusable code snippets, API understanding, and reference examples. Developers tend to prefer general-purpose search engines like Google, which are often not optimized for code related documents and use search strategies and ranking techniques that are more optimized for generic, non-code related information. In this paper, we explore whether a general purpose search engine like Google is an optimal choice for code-related searches. In particular, we investigate whether the performance of searching with Google varies for code vs. non-code related searches. To analyze this, we collect search logs from 310 developers that contains nearly 150,000 search queries from Google and the associated result clicks. To di?erentiate between code-related searches and non-code related searches, we build a model which identifies code intent of queries. Leveraging this model, we build an automatic classifier that detects a code and non-code related query. We confirm the e?ectiveness of the classifier on manually annotated queries where the classifier achieves a precision of 87%, a recall of 86%, and an F1-score of 87%. We apply this classifier to automatically annotate all the queries in the dataset. Analyzing this dataset, we observe that code related searching often requires more e?ort (e.g., time, result clicks, and query modifications) than general non-code search, which indicates code search performance with a general search engine is less effective.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"16 1","pages":"465-475"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3196398.3196425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 49

Abstract

Search is an integral part of a software development process. Developers often use search engines to look for information during development, including reusable code snippets, API understanding, and reference examples. Developers tend to prefer general-purpose search engines like Google, which are often not optimized for code related documents and use search strategies and ranking techniques that are more optimized for generic, non-code related information. In this paper, we explore whether a general purpose search engine like Google is an optimal choice for code-related searches. In particular, we investigate whether the performance of searching with Google varies for code vs. non-code related searches. To analyze this, we collect search logs from 310 developers that contains nearly 150,000 search queries from Google and the associated result clicks. To di?erentiate between code-related searches and non-code related searches, we build a model which identifies code intent of queries. Leveraging this model, we build an automatic classifier that detects a code and non-code related query. We confirm the e?ectiveness of the classifier on manually annotated queries where the classifier achieves a precision of 87%, a recall of 86%, and an F1-score of 87%. We apply this classifier to automatically annotate all the queries in the dataset. Analyzing this dataset, we observe that code related searching often requires more e?ort (e.g., time, result clicks, and query modifications) than general non-code search, which indicates code search performance with a general search engine is less effective.

查看原文本刊更多论文

评估开发人员如何使用通用的web搜索进行代码检索

搜索是软件开发过程中不可或缺的一部分。开发人员经常在开发过程中使用搜索引擎查找信息，包括可重用代码片段、API理解和参考示例。开发人员倾向于使用像Google这样的通用搜索引擎，这些搜索引擎通常没有针对与代码相关的文档进行优化，而使用的搜索策略和排名技术则更适合于通用的、与代码无关的信息。在本文中，我们探讨了像Google这样的通用搜索引擎是否是代码相关搜索的最佳选择。特别是，我们调查了用Google搜索代码与非代码相关搜索的性能是否有所不同。为了分析这一点，我们收集了310名开发人员的搜索日志，其中包含来自Google的近150,000个搜索查询和相关的结果点击。迪吗?为了区分与代码相关的搜索和与代码无关的搜索，我们建立了一个模型来识别查询的代码意图。利用这个模型，我们构建了一个自动分类器来检测代码和非代码相关的查询。我们确认e?分类器在手动注释查询上的有效性，其中分类器实现了87%的精度，86%的召回率和87%的f1分数。我们应用这个分类器自动标注数据集中的所有查询。分析这个数据集，我们观察到与代码相关的搜索通常需要更多的e?比一般的非代码搜索更慢(例如，时间、结果点击和查询修改)，这表明使用一般搜索引擎进行代码搜索的效率较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)

自引率

0.00%

发文量