Cross-project build co-change prediction

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) Pub Date : 2015-03-02 DOI:10.1109/SANER.2015.7081841

Xin Xia, D. Lo, Shane McIntosh, Emad Shihab, A. Hassan

{"title":"Cross-project build co-change prediction","authors":"Xin Xia, D. Lo, Shane McIntosh, Emad Shihab, A. Hassan","doi":"10.1109/SANER.2015.7081841","DOIUrl":null,"url":null,"abstract":"Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction. In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2015.7081841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

Build systems orchestrate how human-readable source code is translated into executable programs. In a software project, source code changes can induce changes in the build system (aka. build co-changes). It is difficult for developers to identify when build co-changes are necessary due to the complexity of build systems. Prediction of build co-changes works well if there is a sufficient amount of training data to build a model. However, in practice, for new projects, there exists a limited number of changes. Using training data from other projects to predict the build co-changes in a new project can help improve the performance of the build co-change prediction. We refer to this problem as cross-project build co-change prediction. In this paper, we propose CroBuild, a novel cross-project build co-change prediction approach that iteratively learns new classifiers. CroBuild constructs an ensemble of classifiers by iteratively building classifiers and assigning them weights according to its prediction error rate. Given that only a small proportion of code changes are build co-changing, we also propose an imbalance-aware approach that learns a threshold boundary between those code changes that are build co-changing and those that are not in order to construct classifiers in each iteration. To examine the benefits of CroBuild, we perform experiments on 4 large datasets including Mozilla, Eclipse-core, Lucene, and Jazz, comprising a total of 50,884 changes. On average, across the 4 datasets, CroBuild achieves a F1-score of up to 0.408. We also compare CroBuild with other approaches such as a basic model, AdaBoost proposed by Freund et al., and TrAdaBoost proposed by Dai et al.. On average, across the 4 datasets, the CroBuild approach yields an improvement in F1-scores of 41.54%, 36.63%, and 36.97% over the basic model, AdaBoost, and TrAdaBoost, respectively.

查看原文本刊更多论文

跨项目构建共变更预测

构建系统编排如何将人类可读的源代码转换为可执行程序。在软件项目中，源代码的更改会引起构建系统的更改。构建co-changes)。由于构建系统的复杂性，开发人员很难确定何时需要进行构建共同更改。如果有足够的训练数据来构建模型，那么构建共同变更的预测就会很有效。然而，在实践中，对于新项目，存在有限数量的更改。使用来自其他项目的训练数据来预测新项目中的构建共同变更可以帮助提高构建共同变更预测的性能。我们把这个问题称为跨项目构建共变更预测。在本文中，我们提出了一种新的跨项目构建共变更预测方法CroBuild，该方法迭代学习新的分类器。CroBuild通过迭代构建分类器并根据其预测错误率为分类器分配权重来构建分类器集成。考虑到只有一小部分代码更改是构建共同更改的，我们还提出了一种不平衡感知方法，该方法可以在构建共同更改的代码更改和非构建共同更改的代码更改之间学习阈值边界，以便在每次迭代中构建分类器。为了检验CroBuild的好处，我们在4个大型数据集上进行了实验，包括Mozilla、Eclipse-core、Lucene和Jazz，总共包含50,884个更改。平均而言，在4个数据集中，CroBuild的f1得分高达0.408。我们还将CroBuild与其他方法进行了比较，例如Freund等人提出的基本模型AdaBoost和Dai等人提出的TrAdaBoost。平均而言，在4个数据集中，CroBuild方法比基本模型、AdaBoost和TrAdaBoost分别提高了41.54%、36.63%和36.97%的f1分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

自引率

0.00%

发文量