Sean Banerjee, Jordan Helmick, Zahid A. Syed, B. Cukic
{"title":"Eclipse vs. Mozilla: A Comparison of Two Large-Scale Open Source Problem Report Repositories","authors":"Sean Banerjee, Jordan Helmick, Zahid A. Syed, B. Cukic","doi":"10.1109/HASE.2015.45","DOIUrl":null,"url":null,"abstract":"Bug tracking systems play an important role in the development and maintenance of large-scale software systems. Having access to open source bug tracking systems has allowed researchers to take advantage of rich datasets and propose solutions to manage duplicate report classification, developer assignment and quality assessment. In spite of research advances, our understanding of the content of these repositories remains limited, primarily because of their size. In many cases, researchers analyze small portions of datasets thus limiting the understanding of the dynamics of problem reporting. The objective of this study is to explore the properties of two large-scale open source problem report repositories. The Eclipse dataset, at the time of download, consisted of 363; 770 reports spanning 11+ years, whereas Mozilla contained 699; 085 reports spanning 14+ years.Our research examines the evolution of datasets over time by analyzing the changes in the repository and the profiles of users who submit problem reports. We provide quantitative evidence on how submitter's maturity reduces the propensity to submit poor quality, insignificant or duplicate reports. We show that a diverse user base, characteristic of Mozilla, creates challenges for the development team as they spend more time triaging, rather than fixing, issues. Finally, we provide the research community with a series of observations and suggestions on how to study large-scale problem repositories.","PeriodicalId":248645,"journal":{"name":"2015 IEEE 16th International Symposium on High Assurance Systems Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 16th International Symposium on High Assurance Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HASE.2015.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Bug tracking systems play an important role in the development and maintenance of large-scale software systems. Having access to open source bug tracking systems has allowed researchers to take advantage of rich datasets and propose solutions to manage duplicate report classification, developer assignment and quality assessment. In spite of research advances, our understanding of the content of these repositories remains limited, primarily because of their size. In many cases, researchers analyze small portions of datasets thus limiting the understanding of the dynamics of problem reporting. The objective of this study is to explore the properties of two large-scale open source problem report repositories. The Eclipse dataset, at the time of download, consisted of 363; 770 reports spanning 11+ years, whereas Mozilla contained 699; 085 reports spanning 14+ years.Our research examines the evolution of datasets over time by analyzing the changes in the repository and the profiles of users who submit problem reports. We provide quantitative evidence on how submitter's maturity reduces the propensity to submit poor quality, insignificant or duplicate reports. We show that a diverse user base, characteristic of Mozilla, creates challenges for the development team as they spend more time triaging, rather than fixing, issues. Finally, we provide the research community with a series of observations and suggestions on how to study large-scale problem repositories.