PANDORA: Continuous Mining Software Repository and Dataset Generation

H. Nguyen, Francesco Lomio, Fabiano Pecorelli, Valentina Lenarduzzi
{"title":"PANDORA: Continuous Mining Software Repository and Dataset Generation","authors":"H. Nguyen, Francesco Lomio, Fabiano Pecorelli, Valentina Lenarduzzi","doi":"10.1109/saner53432.2022.00041","DOIUrl":null,"url":null,"abstract":"During the mining software repository activities, a huge amount of data gathered from different sources is analyzed. Different tools have been developed for collecting and aggregating data from repositories, but they do not easily allow researchers to develop new extractors, to integrate the data collected from other platforms, and in particular from platforms that delete the data periodically. Moreover, mining software repository studies are commonly performed on old versions of software projects and their results are not commonly periodically updated. As a result of the non-continuously updated studies, practitioners often do not trust results from empirical studies. In order to overcome the aforementioned issues, in this paper, we present Pandora, a tool that automatically and continuously mines data from different existing tools and online platforms and enables to run and continuously update the results of mining software repository studies. To evaluate the applicability of our tool, we currently analyzed 365 projects (developed in different languages), continuously collecting data from December 2020 to May 2021 and running an example study, investigating the build-stability of SonarQube rules. Link to dashboard: http://sqa.rd.tuni.fi/superset/dashboard/1 Link to source code: https://github.com/clowee/PANDORA Link to 5-minutes video: https://youtu.be/CuVO9YGJ59I","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/saner53432.2022.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

During the mining software repository activities, a huge amount of data gathered from different sources is analyzed. Different tools have been developed for collecting and aggregating data from repositories, but they do not easily allow researchers to develop new extractors, to integrate the data collected from other platforms, and in particular from platforms that delete the data periodically. Moreover, mining software repository studies are commonly performed on old versions of software projects and their results are not commonly periodically updated. As a result of the non-continuously updated studies, practitioners often do not trust results from empirical studies. In order to overcome the aforementioned issues, in this paper, we present Pandora, a tool that automatically and continuously mines data from different existing tools and online platforms and enables to run and continuously update the results of mining software repository studies. To evaluate the applicability of our tool, we currently analyzed 365 projects (developed in different languages), continuously collecting data from December 2020 to May 2021 and running an example study, investigating the build-stability of SonarQube rules. Link to dashboard: http://sqa.rd.tuni.fi/superset/dashboard/1 Link to source code: https://github.com/clowee/PANDORA Link to 5-minutes video: https://youtu.be/CuVO9YGJ59I
PANDORA:持续挖掘软件存储库和数据集生成
在挖掘软件存储库活动期间,需要分析从不同来源收集的大量数据。已经开发了不同的工具用于从存储库中收集和聚合数据,但是它们不容易允许研究人员开发新的提取器,以集成从其他平台收集的数据,特别是从定期删除数据的平台收集的数据。此外,挖掘软件存储库研究通常是在软件项目的旧版本上进行的,其结果通常不会定期更新。由于研究的不持续更新,从业者往往不相信实证研究的结果。为了克服上述问题,在本文中,我们提出了潘多拉,一个自动持续挖掘来自不同现有工具和在线平台的数据的工具,并能够运行和持续更新挖掘软件存储库研究的结果。为了评估我们的工具的适用性,我们目前分析了365个项目(用不同的语言开发),从2020年12月到2021年5月不断收集数据,并运行一个示例研究,调查SonarQube规则的构建稳定性。链接到仪表板:http://sqa.rd.tuni.fi/superset/dashboard/1链接到源代码:https://github.com/clowee/PANDORA链接到5分钟视频:https://youtu.be/CuVO9YGJ59I
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信