类型和查询扩展的使用是否有助于改进大规模代码搜索?

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2015-11-23 DOI:10.1109/SCAM.2015.7335400

Otávio Augusto Lazzarini Lemos, A. C. D. Paula, Hitesh Sajnani, C. Lopes

{"title":"类型和查询扩展的使用是否有助于改进大规模代码搜索?","authors":"Otávio Augusto Lazzarini Lemos, A. C. D. Paula, Hitesh Sajnani, C. Lopes","doi":"10.1109/SCAM.2015.7335400","DOIUrl":null,"url":null,"abstract":"With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Can the use of types and query expansion help improve large-scale code search?\",\"authors\":\"Otávio Augusto Lazzarini Lemos, A. C. D. Paula, Hitesh Sajnani, C. Lopes\",\"doi\":\"10.1109/SCAM.2015.7335400\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.\",\"PeriodicalId\":192232,\"journal\":{\"name\":\"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2015.7335400\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2015.7335400","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

随着开源代码运动的兴起，以重用为目的的代码搜索变得越来越流行。以至于研究人员将其称为软件重用的新方面。尽管代码搜索在本质上不同于通用文档搜索，但大多数工具仍然主要依赖与源代码文本匹配的关键字。最近，研究人员提出了更复杂的方法来执行代码搜索，例如在查询中包含接口定义(例如，所需函数的返回和参数类型，以及关键字;这里称为接口驱动代码搜索(IDCS)。然而，据我们所知，很少有实证研究将传统的基于关键字的代码搜索(KBCS)与更先进的方法(如IDCS)进行比较。在本文中，我们描述了一个实验，比较了KBCS和IDCS在Java实现的辅助函数的大规模代码搜索任务中的有效性。我们还测量了基于类型和WordNet的查询扩展对这两种方法的影响。我们的实验涉及36个主题，这些主题为16个不同的辅助函数和一个包含超过2,000,000个Java方法的存储库生成了实际查询。结果表明，当与查询扩展相结合时，类型的使用可以提高召回率和返回的相关函数的数量(#RFR)(召回率提高~30%，#RFR提高~43%)。然而，更详细的分析表明，在某些情况下，最好只使用关键字，特别是当这些关键字足以在语义上定义所需的功能时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Can the use of types and query expansion help improve large-scale code search?

With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

自引率

0.00%

发文量