Mozilla问题跟踪历史的多提取和多级数据集

Jiaxin Zhu, Minghui Zhou, Hong Mei
{"title":"Mozilla问题跟踪历史的多提取和多级数据集","authors":"Jiaxin Zhu, Minghui Zhou, Hong Mei","doi":"10.1145/2901739.2903502","DOIUrl":null,"url":null,"abstract":"Many studies analyze issue tracking repositories to understand and support software development. To facilitate the analyses, we share a Mozilla issue tracking dataset covering a 15-year history. The dataset includes three extracts and multiple levels for each extract. The three extracts were retrieved through two channels, a front-end (web user interface (UI)), and a back-end (official database dump) of Mozilla Bugzilla at three different times. The variations (dynamics) among extracts provide space for researchers to reproduce and validate their studies, while revealing potential opportunities for studies that otherwise could not be conducted. We provide different data levels for each extract ranging from raw data to standardized data as well as to the calculated data level for targeting specific research questions. Data retrieving and processing scripts related to each data level are offered too. By employing the multi-level structure, analysts can more efficiently start an inquiry from the standardized level and easily trace the data chain when necessary (e.g., to verify if a phenomenon reflected by the data is an actual event). We applied this dataset to several published studies and intend to expand the multi-level and multi-extract feature to other software engineering datasets.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"34 1","pages":"472-475"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History\",\"authors\":\"Jiaxin Zhu, Minghui Zhou, Hong Mei\",\"doi\":\"10.1145/2901739.2903502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many studies analyze issue tracking repositories to understand and support software development. To facilitate the analyses, we share a Mozilla issue tracking dataset covering a 15-year history. The dataset includes three extracts and multiple levels for each extract. The three extracts were retrieved through two channels, a front-end (web user interface (UI)), and a back-end (official database dump) of Mozilla Bugzilla at three different times. The variations (dynamics) among extracts provide space for researchers to reproduce and validate their studies, while revealing potential opportunities for studies that otherwise could not be conducted. We provide different data levels for each extract ranging from raw data to standardized data as well as to the calculated data level for targeting specific research questions. Data retrieving and processing scripts related to each data level are offered too. By employing the multi-level structure, analysts can more efficiently start an inquiry from the standardized level and easily trace the data chain when necessary (e.g., to verify if a phenomenon reflected by the data is an actual event). We applied this dataset to several published studies and intend to expand the multi-level and multi-extract feature to other software engineering datasets.\",\"PeriodicalId\":6621,\"journal\":{\"name\":\"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)\",\"volume\":\"34 1\",\"pages\":\"472-475\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2901739.2903502\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2901739.2903502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

许多研究分析问题跟踪存储库来理解和支持软件开发。为了便于分析,我们分享了一个涵盖15年历史的Mozilla问题跟踪数据集。该数据集包括三个提取,每个提取都有多个级别。这三个摘要在三个不同的时间通过两个通道检索,即Mozilla Bugzilla的前端(web用户界面(UI))和后端(官方数据库转储)。萃取物之间的变化(动态)为研究人员提供了再现和验证其研究的空间,同时揭示了无法进行的研究的潜在机会。我们为每个提取提供不同的数据级别,从原始数据到标准化数据以及针对特定研究问题的计算数据级别。并提供了与各个数据层相关的数据检索和处理脚本。通过采用多层次结构,分析人员可以更有效地从标准化层面开始查询,并在必要时轻松跟踪数据链(例如,验证数据反映的现象是否为实际事件)。我们将该数据集应用于几项已发表的研究,并打算将多层次和多提取特征扩展到其他软件工程数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History
Many studies analyze issue tracking repositories to understand and support software development. To facilitate the analyses, we share a Mozilla issue tracking dataset covering a 15-year history. The dataset includes three extracts and multiple levels for each extract. The three extracts were retrieved through two channels, a front-end (web user interface (UI)), and a back-end (official database dump) of Mozilla Bugzilla at three different times. The variations (dynamics) among extracts provide space for researchers to reproduce and validate their studies, while revealing potential opportunities for studies that otherwise could not be conducted. We provide different data levels for each extract ranging from raw data to standardized data as well as to the calculated data level for targeting specific research questions. Data retrieving and processing scripts related to each data level are offered too. By employing the multi-level structure, analysts can more efficiently start an inquiry from the standardized level and easily trace the data chain when necessary (e.g., to verify if a phenomenon reflected by the data is an actual event). We applied this dataset to several published studies and intend to expand the multi-level and multi-extract feature to other software engineering datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信