{"title":"Finding Relevant Applications for Prototyping","authors":"M. Grechanik, Kevin M. Conroy, Katharina Probst","doi":"10.1109/MSR.2007.9","DOIUrl":"https://doi.org/10.1109/MSR.2007.9","url":null,"abstract":"When gathering requirements for new software projects, it is often cost-effective to find similar applications that can be used as the basis for prototypes rather than building them from scratch. However, finding such sample applications can be difficult, often making prototyping time-consuming and expensive. We offer a novel approach called Exemplar (EXEcutable exaMPLes ARchive) for finding highly relevant software projects from a large archive of executable applications. After a programmer enters a query that contains high-level concepts (e.g., toolbar, download, smart card), Exemplar uses information retrieval and program analysis to retrieve applications that implement these concepts. We hypothesize that Exemplar will be effective and efficient in helping programmers to quickly find highly relevant applications to support prototyping.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121973294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forecasting the Number of Changes in Eclipse Using Time Series Analysis","authors":"I. Herraiz, Jesus M. Gonzalez-Barahona, G. Robles","doi":"10.1109/MSR.2007.10","DOIUrl":"https://doi.org/10.1109/MSR.2007.10","url":null,"abstract":"In order to predict the number of changes in the following months for the project Eclipse, we have applied a statistical (non-explanatory) model based on time series analysis. We have obtained the monthly number of changes in the CVS repository of Eclipse, using the CVSAnalY tool. The input to our model was the filtered series of the number of changes per month, and the output was the number of changes per month for the next three months. Then we aggregated the results of the three months to obtain the total number of changes in the given period in the challenge.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130809645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Approaches to Mining Source Code for Call-Usage Patterns","authors":"Huzefa H. Kagdi, M. Collard, Jonathan I. Maletic","doi":"10.1109/MSR.2007.3","DOIUrl":"https://doi.org/10.1109/MSR.2007.3","url":null,"abstract":"Two approaches for mining function-call usage patterns from source code are compared The first approach, itemset mining, has recently been applied to this problem. The other approach, sequential-pattern mining, has not been previously applied to this problem. Here, a call-usage pattern is a composition of function calls that occur in a function definition. Both approaches look for frequently occurring patterns that represent standard usage of functions and identify possible errors. Itemset mining produces unordered patterns, i.e., sets of function calls, whereas, sequential-pattern mining produces partially ordered patterns, i.e., sequences of function calls. The trade-off between the additional ordering context given by sequential-pattern mining and the efficiency of itemset mining is investigated. The two approaches are applied to the Lima kernel v2.6.14 and results show that mining ordered patterns is worth the additional cost.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128090679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Single-Version and Evolutionary Dependencies for Software-Change Prediction","authors":"Huzefa H. Kagdi, Jonathan I. Maletic","doi":"10.1109/MSR.2007.2","DOIUrl":"https://doi.org/10.1109/MSR.2007.2","url":null,"abstract":"The paper advocates the need for the investigation and development of a software-change prediction methodology that combines the change sets estimated from software dependency analysis (via single-version analysis) and the actual change sets found in software version histories (via multiple-version analysis). Traditionally prescribed methodologies such as Impact Analysis (IA) are based on the former, whereas a more recent methodology, mining software repository (MSR), is based on the latter. The research hypothesis is that combining these two methodologies will result in an overall improved support for software-change prediction.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117331545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of the Creation of the Mozilla Foundation in the Activity of Developers","authors":"Jesus M. Gonzalez-Barahona, G. Robles, I. Herraiz","doi":"10.1109/MSR.2007.15","DOIUrl":"https://doi.org/10.1109/MSR.2007.15","url":null,"abstract":"During 2003, the Mozilla project transitioned from company-promoted (sponsored by AOL) to community-promoted (sponsored by the Mozilla Foundation). What happened to the group of developers during this transition? There was any significant impact on its activity or composition? To answer these questions, we have performed an analysis of the CVS repository of Mozilla, using the CVSAnalY tool, finding little on activity, but dramatic changes in the the composition of the development team.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128252924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a Theoretical Model for Software Growth","authors":"I. Herraiz, Jesus M. Gonzalez-Barahona, G. Robles","doi":"10.1109/MSR.2007.31","DOIUrl":"https://doi.org/10.1109/MSR.2007.31","url":null,"abstract":"Software growth (and more broadly, software evolution) is usually considered in terms of size or complexity of source code. However in different studies, usually different metrics are used, which make it difficult to compare approaches and results. In addition, not all metrics are equally easy to calculate for a given source code, which leads to the question of which one is the easiest to calculate without losing too much information. To address both issues, in this paper present a comprehensive study, based on the analysis of about 700,000 C source code files, calculating several size and complexity metrics for all of them. For this sample, we have found double Pareto statistical distributions for all metrics considered, and a high correlation between any two of them. This would imply that any model addressing software growth should produce this Pareto distributions, and that analysis based on any of the considered metrics should show a similar pattern, provided the sample of files considered is large enough.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122470237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining a Change-Based Software Repository","authors":"R. Robbes","doi":"10.1109/MSR.2007.18","DOIUrl":"https://doi.org/10.1109/MSR.2007.18","url":null,"abstract":"Although state-of-the-art software repositories based on versioning system information are useful to assess the evolution of a software system, the information they contain is limited in several ways. Versioning systems such as CVS or subversion store only snapshots of text files, leading to a loss of information: The exact sequence of changes between two versions is hard to recover. In this paper we present an alternative information repository which stores incremental changes to the system under study, retrieved from the IDE used to build the software. We then use this change-based model of system evolution to assess when refactorings happen in two case studies, and compare our findings with refactoring detection approaches on classical versioning system repositories.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131230609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correlating Social Interactions to Release History during Software Evolution","authors":"Olga Baysal, A. Malton","doi":"10.1109/MSR.2007.4","DOIUrl":"https://doi.org/10.1109/MSR.2007.4","url":null,"abstract":"In this paper, we propose a method to reason about the nature of software changes by mining and correlating discussion archives. We employ an information retrieval approach to find correlation between source code change history and history of social interactions surrounding these changes. We apply our correlation method on two software systems, LSEdit and Apache Ant. The results of these exploratory case studies demonstrate the evidence of similarity between the content of free-form text emails among developers and the actual modifications in the code. We identify a set of correlation patterns between discussion and changed code vocabularies and discover that some releases referred to as minor should instead fall under the major category. These patterns can be used to give estimations about the type of a change and time needed to implement it.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126566621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Release Pattern Discovery via Partitioning: Methodology and Case Study","authors":"Abram Hindle, Michael W. Godfrey, R. Holt","doi":"10.1109/MSR.2007.28","DOIUrl":"https://doi.org/10.1109/MSR.2007.28","url":null,"abstract":"The development of Open Source systems produces a variety of software artifacts such as source code, version control records, bug reports, and email discussions. Since the development is distributed across different tool environments and developer practices, any analysis of project behavior must be inferred from whatever common artifacts happen to be available. In this paper, we propose an approach to characterizing a project's behavior around the time of major and minor releases; we do this by partitioning the observed activities, such as artifact check-ins, around the dates of major and minor releases, and then look for recognizable patterns. We validate this approach by means of a case study on the MySQL database system; in this case study, we found patterns which suggested MySQL was behaving consistently within itself. These patterns included testing and documenting that took place more before a release than after and that the rate of source code changes dipped around release time.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128396751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the Harmfulness of Cloning: A Change Based Experiment","authors":"A. Lozano, M. Wermelinger, B. Nuseibeh","doi":"10.1109/MSR.2007.8","DOIUrl":"https://doi.org/10.1109/MSR.2007.8","url":null,"abstract":"Cloning is considered a harmful practice for software maintenance because it requires consistent changes of the entities that share a cloned fragment. However this claim has not been refuted or confirmed empirically. Therefore, we have developed a prototype tool, CloneTracker, in order to study the rate of change of applications containing clones. This paper describes CloneTracker and illustrates its preliminary application on a case study.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128729993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}