Performing large-scale mining studies: from start to finish (tutorial)

Robert Dyer, Samuel W. Flint
{"title":"Performing large-scale mining studies: from start to finish (tutorial)","authors":"Robert Dyer, Samuel W. Flint","doi":"10.1145/3540250.3569448","DOIUrl":null,"url":null,"abstract":"Modern software engineering research often relies on mining open-source software repositories, to either provide motivation for their research problems and/or evaluation of the proposed approach. Mining ultra-large-scale software repositories is still a difficult task, requiring substantial expertise and access to significant hardware. Tools such as Boa can help researchers easily mine large numbers of open-source repositories. There has also recently been more of a push toward open science, with an emphasis on making replication packages available. Building such replication packages incurs additional workload for researchers. In this tutorial, we teach how to use the Boa infrastructure for mining software repository data. We leverage Boa’s VS Code IDE extension to help write and submit Boa queries, and also leverage Boa’s study template to show how researchers can more easily analyze the output from Boa and automatically produce a suitable replication package that is published on Zenodo.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3569448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Modern software engineering research often relies on mining open-source software repositories, to either provide motivation for their research problems and/or evaluation of the proposed approach. Mining ultra-large-scale software repositories is still a difficult task, requiring substantial expertise and access to significant hardware. Tools such as Boa can help researchers easily mine large numbers of open-source repositories. There has also recently been more of a push toward open science, with an emphasis on making replication packages available. Building such replication packages incurs additional workload for researchers. In this tutorial, we teach how to use the Boa infrastructure for mining software repository data. We leverage Boa’s VS Code IDE extension to help write and submit Boa queries, and also leverage Boa’s study template to show how researchers can more easily analyze the output from Boa and automatically produce a suitable replication package that is published on Zenodo.
执行大规模采矿研究:从头到尾(教程)
现代软件工程研究经常依赖于挖掘开源软件存储库,为他们的研究问题和/或评估所提出的方法提供动力。挖掘超大型软件存储库仍然是一项艰巨的任务,需要大量的专业知识和重要的硬件。像Boa这样的工具可以帮助研究人员轻松地挖掘大量的开源存储库。最近也有更多的人在推动开放科学,重点是使复制包可用。构建这样的复制包会给研究人员带来额外的工作量。在本教程中,我们将介绍如何使用Boa基础设施来挖掘软件存储库数据。我们利用Boa的VS Code IDE扩展来帮助编写和提交Boa查询,并且还利用Boa的研究模板来展示研究人员如何更轻松地分析Boa的输出并自动生成在Zenodo上发布的合适的复制包。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
676
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信