2009 6th IEEE International Working Conference on Mining Software Repositories最新文献

Mining the history of synchronous changes to refine code ownership 挖掘同步变更的历史以细化代码所有权

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069492

Lile Hattori, Michele Lanza

引用次数: 45

Code siblings: Technical and legal implications of copying code between applications 代码兄弟:在应用程序之间复制代码的技术和法律含义

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069483

D. Germán, M. D. Penta, Yann-Gaël Guéhéneuc, G. Antoniol

{"title":"Code siblings: Technical and legal implications of copying code between applications","authors":"D. Germán, M. D. Penta, Yann-Gaël Guéhéneuc, G. Antoniol","doi":"10.1109/MSR.2009.5069483","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069483","url":null,"abstract":"Source code cloning does not happen within a single system only. It can also occur between one system and another. We use the term code sibling to refer to a code clone that evolves in a different system than the code from which it originates. Code siblings can only occur when the source code copyright owner allows it and when the conditions imposed by such license are not incompatible with the license of the destination system. In some situations copying of source code fragments are allowed—legally—in one direction, but not in the other. In this paper, we use clone detection, license mining and classification, and change history techniques to understand how code siblings—under different licenses—flow in one direction or the other between Linux and two BSD Unixes, FreeBSD and OpenBSD. Our results show that, in most cases, this migration appears to happen according to the terms of the license of the original code being copied, favoring always copying from less restrictive licenses towards more restrictive ones. We also discovered that sometimes code is inserted to the kernels from an outside source.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123394546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 96

From work to word: How do software developers describe their work? 从工作到单词:软件开发人员如何描述他们的工作?

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069490

W. Maalej, Hans-Jörg Happel

引用次数: 42

Tracking concept drift of software projects using defect prediction quality 利用缺陷预测质量跟踪软件项目的概念漂移

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069480

J. Ekanayake, Jonas Tappolet, H. Gall, A. Bernstein

{"title":"Tracking concept drift of software projects using defect prediction quality","authors":"J. Ekanayake, Jonas Tappolet, H. Gall, A. Bernstein","doi":"10.1109/MSR.2009.5069480","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069480","url":null,"abstract":"Defect prediction is an important task in the mining of software repositories, but the quality of predictions varies strongly within and across software projects. In this paper we investigate the reasons why the prediction quality is so fluctuating due to the altering nature of the bug (or defect) fixing process. Therefore, we adopt the notion of a concept drift, which denotes that the defect prediction model has become unsuitable as set of influencing features has changed - usually due to a change in the underlying bug generation process (i.e., the concept). We explore four open source projects (Eclipse, OpenOffice, Netbeans and Mozilla) and construct file-level and project-level features for each of them from their respective CVS and Bugzilla repositories. We then use this data to build defect prediction models and visualize the prediction quality along the time axis. These visualizations allow us to identify concept drifts and - as a consequence - phases of stability and instability expressed in the level of defect prediction quality. Further, we identify those project features, which are influencing the defect prediction quality using both a tree induction-algorithm and a linear regression model. Our experiments uncover that software systems are subject to considerable concept drifts in their evolution history. Specifically, we observe that the change in number of authors editing a file and the number of defects fixed by them contribute to a project's concept drift and therefore influence the defect prediction quality. Our findings suggest that project managers using defect prediction models for decision making should be aware of the actual phase of stability or instability due to a potential concept drift.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114314067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 70

Evolution of the core team of developers in libre software projects 自由软件项目中核心开发团队的发展

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069497

G. Robles, Jesus M. Gonzalez-Barahona, I. Herraiz

引用次数: 90

Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code 使用源代码中词频的对数似然比自动标记软件组件及其演变

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069499

Adrian Kuhn

引用次数: 29

Evaluating process quality in GNOME based on change request data 基于变更请求数据评估GNOME中的流程质量

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069485

Holger Schackmann, H. Lichter

引用次数: 13

Mining source code to automatically split identifiers for software analysis 挖掘源代码自动分割标识符用于软件分析

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069482

Eric Enslen, Emily Hill, L. Pollock, K. Vijay-Shanker

{"title":"Mining source code to automatically split identifiers for software analysis","authors":"Eric Enslen, Emily Hill, L. Pollock, K. Vijay-Shanker","doi":"10.1109/MSR.2009.5069482","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069482","url":null,"abstract":"Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their constituent words. Unlike natural languages, where space and punctuation are used to delineate words, identifiers cannot contain spaces. One common way to split identifiers is to follow programming language naming conventions. For example, Java programmers often use camel case, where words are delineated by uppercase letters or non-alphabetic characters. However, programmers also create identifiers by concatenating sequences of words together with no discernible delineation, which poses challenges to automatic identifier splitting. In this paper, we present an algorithm to automatically split identifiers into sequences of words by mining word frequencies in source code. With these word frequencies, our identifier splitter uses a scoring technique to automatically select the most appropriate partitioning for an identifier. In an evaluation of over 8000 identifiers from open source Java programs, our Samurai approach outperforms the existing state of the art techniques.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"312 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123678433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 193

Mining the coherence of GNOME bug reports with statistical topic models 利用统计主题模型挖掘GNOME bug报告的一致性

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069486

Erik J. Linstead, P. Baldi

引用次数: 40

SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects SourcererDB:静态分析和交叉链接的开源Java项目的聚合存储库

2009 6th IEEE International Working Conference on Mining Software Repositories Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069501

Joel Ossher, S. Bajracharya, Erik J. Linstead, P. Baldi, C. Lopes

{"title":"SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects","authors":"Joel Ossher, S. Bajracharya, Erik J. Linstead, P. Baldi, C. Lopes","doi":"10.1109/MSR.2009.5069501","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069501","url":null,"abstract":"Abstract The open source movement has made vast quantities of source code available online for free, providing an extremely large dataset for empirical study and potential resuse. A major difficulty in exploiting this potential fully is that the data are currently scattered between competing source code repositories, none of which are structured for empirical analysis and cross-project comparison. As a result, software researchers and developers are left to compile their own datasets, resulting in duplicated effort and limited results. To address this challenge, we built SourcererDB, an aggregated repository of statically analyzed and cross-linked open source Java projects. SourcererDB contains local snapshots of 2,852 Java projects taken from Sourceforge, Apache and Java.net. These projects are statically analyzed to extract rich structural information, which is then stored in a relational database. References to entities in the 16,058 external jars are resolved and grouped, allowing for cross-project usage information to be accessed easily. This paper describes: (a) the mechanism for resolving and grouping these cross-project references, (b) the structure of and the metamodel for the SourcererDB repository, and (d) end-user dataset access mechanisms. Our goal in building SourcererDB is to provide a rich dataset of source code to facilitate the sharing of extracted data and to encourage reuse and repeatability of experiments.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129439086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47