一个用于软件重用研究的未删节源代码数据集

Werner Janjic, Oliver Hummel, M. Schumacher, C. Atkinson
{"title":"一个用于软件重用研究的未删节源代码数据集","authors":"Werner Janjic, Oliver Hummel, M. Schumacher, C. Atkinson","doi":"10.1109/MSR.2013.6624047","DOIUrl":null,"url":null,"abstract":"This paper describes a large, unabridged data-set of Java source code gathered and shared as part of the Merobase Component Finder project of the Software-Engineering Group at the University of Mannheim. It consists of the complete index used to drive the search engine, www.merobase.com, the vast majority1 of the source code modules accessible through it, and a tool that enables researchers to efficiently browse the collected data. We describe the techniques used to collect, format and store the data set, as well as the core capabilities of the Merobase search engine such as classic keyword-based, interface-based and test-driven search. This data-set, which represents one of the largest searchable collections of source and binary modules available online, has been recently made available for download and use in further research projects. All files are available at http://merobase.informatik.uni-mannheim.de/sources/.","PeriodicalId":325271,"journal":{"name":"2013 10th Working Conference on Mining Software Repositories (MSR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An unabridged source code dataset for research in software reuse\",\"authors\":\"Werner Janjic, Oliver Hummel, M. Schumacher, C. Atkinson\",\"doi\":\"10.1109/MSR.2013.6624047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a large, unabridged data-set of Java source code gathered and shared as part of the Merobase Component Finder project of the Software-Engineering Group at the University of Mannheim. It consists of the complete index used to drive the search engine, www.merobase.com, the vast majority1 of the source code modules accessible through it, and a tool that enables researchers to efficiently browse the collected data. We describe the techniques used to collect, format and store the data set, as well as the core capabilities of the Merobase search engine such as classic keyword-based, interface-based and test-driven search. This data-set, which represents one of the largest searchable collections of source and binary modules available online, has been recently made available for download and use in further research projects. All files are available at http://merobase.informatik.uni-mannheim.de/sources/.\",\"PeriodicalId\":325271,\"journal\":{\"name\":\"2013 10th Working Conference on Mining Software Repositories (MSR)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 10th Working Conference on Mining Software Repositories (MSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSR.2013.6624047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 10th Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2013.6624047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

本文描述了作为曼海姆大学软件工程小组Merobase Component Finder项目的一部分收集和共享的一个大型的、未删节的Java源代码数据集。它包括用于驱动搜索引擎的完整索引,www.merobase.com,通过它可以访问的绝大多数源代码模块,以及使研究人员能够有效浏览收集到的数据的工具。我们描述了用于收集、格式化和存储数据集的技术,以及Merobase搜索引擎的核心功能,如经典的基于关键字的、基于接口的和测试驱动的搜索。该数据集代表了在线可搜索的最大的源代码和二进制模块集合之一,最近可供下载并用于进一步的研究项目。所有文件可在http://merobase.informatik.uni-mannheim.de/sources/上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An unabridged source code dataset for research in software reuse
This paper describes a large, unabridged data-set of Java source code gathered and shared as part of the Merobase Component Finder project of the Software-Engineering Group at the University of Mannheim. It consists of the complete index used to drive the search engine, www.merobase.com, the vast majority1 of the source code modules accessible through it, and a tool that enables researchers to efficiently browse the collected data. We describe the techniques used to collect, format and store the data set, as well as the core capabilities of the Merobase search engine such as classic keyword-based, interface-based and test-driven search. This data-set, which represents one of the largest searchable collections of source and binary modules available online, has been recently made available for download and use in further research projects. All files are available at http://merobase.informatik.uni-mannheim.de/sources/.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信