{"title":"Mining Workspace Updates in CVS","authors":"Thomas Zimmermann","doi":"10.1109/MSR.2007.22","DOIUrl":"https://doi.org/10.1109/MSR.2007.22","url":null,"abstract":"The version control archive CVS records not only all changes in a project but also activity data such as when developers create or update their workspaces. Furthermore, CVS records when it has to integrate changes because of parallel development. In this paper, we analyze the CVS activity data of four large open-source projects CCC, JBOSS, JEDIT, and PYTHON to investigate parallel development: What is the degree of parallel development? How frequently do conflicts occur during updates and how are they resolved? How do we identify changes that contain integrations?","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130392245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visual Data Mining in Software Archives to Detect How Developers Work Together","authors":"P. Weißgerber, M. Pohl, Michael Burch","doi":"10.1109/MSR.2007.34","DOIUrl":"https://doi.org/10.1109/MSR.2007.34","url":null,"abstract":"Analyzing the check-in information of open source software projects which use a version control system such as CVS or SUBVERSION can yield interesting and important insights into the programming behavior of developers. As in every major project tasks are assigned to many developers, the development must be coordinated between these programmers. This paper describes three visualization techniques that help to examine how programmers work together, e.g. if they work as a team or if they develop their part of the software separate from each other. Furthermore, phases of stagnation in the lifetime of a project can be uncovered and thus, possible problems are revealed. To demonstrate the usefulness of these visualization techniques we performed case studies on two open source projects. In these studies interesting patterns of developers' behavior, e.g. the specialization on a certain module can be observed. Moreover, modules that have been changed by many developers can be identified as well as such ones that have been altered by only one programmer.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining CVS Repositories to Understand Open-Source Project Developer Roles","authors":"Liguo Yu, S. Ramaswamy","doi":"10.1109/MSR.2007.19","DOIUrl":"https://doi.org/10.1109/MSR.2007.19","url":null,"abstract":"This paper presents a model to represent the interactions of distributed open-source software developers and utilizes data mining techniques to derive developer roles. The model is then applied on case studies of two open-source projects, ORAC-DR and Mediawiki with encouraging results.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116044199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defect Data Analysis Based on Extended Association Rule Mining","authors":"Shuji Morisaki, Akito Monden, Tomoko Matsumura, Haruaki Tamada, Ken-ichi Matsumoto","doi":"10.1109/MSR.2007.5","DOIUrl":"https://doi.org/10.1109/MSR.2007.5","url":null,"abstract":"This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (l)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (i)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"22 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120836620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List","authors":"Peter C. Rigby, A. Hassan","doi":"10.1109/MSR.2007.35","DOIUrl":"https://doi.org/10.1109/MSR.2007.35","url":null,"abstract":"Developer mailing lists are a rich source of information about Open Source Software (OSS) development. The unstructured nature of email makes extracting information difficult. We use a psychometrically-based linguistic analysis tool, the LIWC, to examine the Apache httpd server developer mailing list. We conduct three preliminary experiments to assess the appropriateness of this tool for information extraction from mailing lists. First, using LIWC dimensions that are correlated with the big five personality traits, we assess the personality of four top developers against a baseline for the entire mailing list. The two developers that were responsible for the major Apache releases had similar personalities. Their personalities were different from the baseline and the other developers. Second, the first and last 50 emails for two top developers who have left the project are examined. The analysis shows promise in understanding why developers join and leave a project. Third, we examine word usage on the mailing list for two major Apache releases. The differences may reflect the relative success of each release.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126370247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hemant Joshi, Chuanlei Zhang, S. Ramaswamy, Coskun Bayrak
{"title":"Local and Global Recency Weighting Approach to Bug Prediction","authors":"Hemant Joshi, Chuanlei Zhang, S. Ramaswamy, Coskun Bayrak","doi":"10.1109/MSR.2007.17","DOIUrl":"https://doi.org/10.1109/MSR.2007.17","url":null,"abstract":"Finding and fixing software bugs is a challenging maintenance task, and a significant amount of effort is invested by software development companies on this issue. In this paper, we use the Eclipse project's recorded software bug history to predict occurrence of future bugs. The history contains information on when bugs have been reported and subsequently fixed.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114063315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prioritizing Warning Categories by Analyzing Software History","authors":"Sunghun Kim, Michael D. Ernst","doi":"10.1109/MSR.2007.26","DOIUrl":"https://doi.org/10.1109/MSR.2007.26","url":null,"abstract":"Automatic bug finding tools tend to have high false positive rates: most warnings do not indicate real bugs. Usually bug finding tools prioritize each warning category. For example, the priority of \"overflow \" is 1 and the priority of \"jumbled incremental\" is 3, but the tools 'prioritization is not very effective. In this paper, we prioritize warning categories by analyzing the software change history. The underlying intuition is that if warnings from a category are resolved quickly by developers, the warnings in the category are important. Experiments with three bug finding tools (FindBugs, JLint, and PMD) and two open source projects (Columba and jEdit) indicate that different warning categories have very different lifetimes. Based on that observation, we propose a preliminary algorithm for warning category prioritizing.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114571060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simone Livieri, Yoshiki Higo, M. Matsushita, Katsuro Inoue
{"title":"Analysis of the Linux Kernel Evolution Using Code Clone Coverage","authors":"Simone Livieri, Yoshiki Higo, M. Matsushita, Katsuro Inoue","doi":"10.1109/MSR.2007.1","DOIUrl":"https://doi.org/10.1109/MSR.2007.1","url":null,"abstract":"Most studies of the evolution of software systems are based on the comparison of simple software metrics. In this paper, we present our preliminary investigation of the evolution of the Linux kernel using code-clone analysis and the code-clone coverage metrics. We examined 136 versions of the stable Linux kernel using a distributed extension of the code clone detection tool CCFinder. The result is shown as a heat map.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124824884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, A. Zeller
{"title":"How Long Will It Take to Fix This Bug?","authors":"Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, A. Zeller","doi":"10.1109/MSR.2007.13","DOIUrl":"https://doi.org/10.1109/MSR.2007.13","url":null,"abstract":"Predicting the time and effort for a software problem has long been a difficult task. We present an approach that automatically predicts the fixing effort, i.e., the person-hours spent on fixing an issue. Our technique leverages existing issue tracking systems: given a new issue report, we use the Lucene framework to search for similar, earlier reports and use their average time as a prediction. Our approach thus allows for early effort estimation, helping in assigning issues and scheduling stable releases. We evaluated our approach using effort data from the JBoss project. Given a sufficient number of issues reports, our automatic predictions are close to the actual effort; for issues that are bugs, we are off by only one hour, beating naive predictions by a factor of four.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129421574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Eclipse Bug Lifetimes","authors":"Lucas D. Panjer","doi":"10.1109/MSR.2007.25","DOIUrl":"https://doi.org/10.1109/MSR.2007.25","url":null,"abstract":"In non-trivial software development projects planning and allocation of resources is an important and difficult task. Estimation of work time to fix a bug is commonly used to support this process. This research explores the viability of using data mining tools to predict the time to fix a bug given only the basic information known at the beginning of a bug's lifetime. To address this question, a historical portion of the Eclipse Bugzilla database is used for modeling and predicting bug lifetimes. A bug history transformation process is described and several data mining models are built and tested. Interesting behaviours derived from the models are documented. The models can correctly predict up to 34.9% of the bugs into a discretized log scaled lifetime class.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127563007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}