From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects

Renan Vieira, Antônio da Silva, L. Rocha, J. Gomes
{"title":"From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects","authors":"Renan Vieira, Antônio da Silva, L. Rocha, J. Gomes","doi":"10.1145/3345629.3345639","DOIUrl":null,"url":null,"abstract":"Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git). We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.","PeriodicalId":424201,"journal":{"name":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3345629.3345639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git). We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.
从报告到bug修复提交:来自55个Apache开源项目的10年bug修复活动数据集
几乎在任何软件开发中都会出现bug。解决所有或至少大部分问题需要大量的时间、精力和预算。软件项目通常使用问题跟踪系统作为报告和监视错误修复任务的一种方式。近年来,一些研究人员一直在进行bug跟踪分析,以更好地了解问题,从而提供降低成本和提高bug修复任务效率的方法。在本文中,我们介绍了一个新的数据集,该数据集由Apache软件基金会55个项目10年来的错误修复活动中的70,000多个错误修复报告组成,分为9个类别。我们从Jira问题跟踪系统中挖掘了这些信息,这些信息涉及具有关闭/已解决状态的报告的两个不同视角:静态(报告的最新版本)和动态(随着时间的推移报告中发生的变化)。我们还从各自的版本控制系统(Git)中修复这些错误的提交(如果存在的话)中提取信息。我们还提供了报告中发生的变化分析,作为说明和描述建议数据集的一种方式。一旦数据提取过程是一个容易出错的重要任务,我们相信这样的倡议可以帮助研究人员进行更详细的调查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信