{"title":"Performing large-scale mining studies: from start to finish (tutorial)","authors":"Robert Dyer, Samuel W. Flint","doi":"10.1145/3540250.3569448","DOIUrl":null,"url":null,"abstract":"Modern software engineering research often relies on mining open-source software repositories, to either provide motivation for their research problems and/or evaluation of the proposed approach. Mining ultra-large-scale software repositories is still a difficult task, requiring substantial expertise and access to significant hardware. Tools such as Boa can help researchers easily mine large numbers of open-source repositories. There has also recently been more of a push toward open science, with an emphasis on making replication packages available. Building such replication packages incurs additional workload for researchers. In this tutorial, we teach how to use the Boa infrastructure for mining software repository data. We leverage Boa’s VS Code IDE extension to help write and submit Boa queries, and also leverage Boa’s study template to show how researchers can more easily analyze the output from Boa and automatically produce a suitable replication package that is published on Zenodo.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"141 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3569448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Modern software engineering research often relies on mining open-source software repositories, to either provide motivation for their research problems and/or evaluation of the proposed approach. Mining ultra-large-scale software repositories is still a difficult task, requiring substantial expertise and access to significant hardware. Tools such as Boa can help researchers easily mine large numbers of open-source repositories. There has also recently been more of a push toward open science, with an emphasis on making replication packages available. Building such replication packages incurs additional workload for researchers. In this tutorial, we teach how to use the Boa infrastructure for mining software repository data. We leverage Boa’s VS Code IDE extension to help write and submit Boa queries, and also leverage Boa’s study template to show how researchers can more easily analyze the output from Boa and automatically produce a suitable replication package that is published on Zenodo.