An Experimental Evaluation of Imbalanced Learning and Time-Series Validation in the Context of CI/CD Prediction

Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering Pub Date : 2020-04-15 DOI:10.1145/3383219.3383222

Bohan Liu, He Zhang, Lanxin Yang, Liming Dong, Haifeng Shen, Kaiwen Song

{"title":"An Experimental Evaluation of Imbalanced Learning and Time-Series Validation in the Context of CI/CD Prediction","authors":"Bohan Liu, He Zhang, Lanxin Yang, Liming Dong, Haifeng Shen, Kaiwen Song","doi":"10.1145/3383219.3383222","DOIUrl":null,"url":null,"abstract":"Background: Machine Learning (ML) has been widely used as a powerful tool to support Software Engineering (SE). The fundamental assumptions of data characteristics required for specific ML methods have to be carefully considered prior to their applications in SE. Within the context of Continuous Integration (CI) and Continuous Deployment (CD) practices, there are two vital characteristics of data prone to be violated in SE research. First, the logs generated during CI/CD for training are imbalanced data, which is contrary to the principles of common balanced classifiers; second, these logs are also time-series data, which violates the assumption of cross-validation. Objective: We aim to systematically study the two data characteristics and further provide a comprehensive evaluation for predictive CI/CD with the data from real projects. Method: We conduct an experimental study that evaluates 67 CI/CD predictive models using both cross-validation and time-series-validation. Results: Our evaluation shows that cross-validation makes the evaluation of the models optimistic in most cases, there are a few counter-examples as well. The performance of the top 10 imbalanced models are better than the balanced models in the predictions of failed builds, even for balanced data. The degree of data imbalance has a negative impact on prediction performance. Conclusion: In research and practice, the assumptions of the various ML methods should be seriously considered for the validity of research. Even if it is used to compare the relative performance of models, cross-validation may not be applicable to the problems with time-series features. The research community need to revisit the evaluation results reported in some existing research.","PeriodicalId":334629,"journal":{"name":"Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3383219.3383222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Background: Machine Learning (ML) has been widely used as a powerful tool to support Software Engineering (SE). The fundamental assumptions of data characteristics required for specific ML methods have to be carefully considered prior to their applications in SE. Within the context of Continuous Integration (CI) and Continuous Deployment (CD) practices, there are two vital characteristics of data prone to be violated in SE research. First, the logs generated during CI/CD for training are imbalanced data, which is contrary to the principles of common balanced classifiers; second, these logs are also time-series data, which violates the assumption of cross-validation. Objective: We aim to systematically study the two data characteristics and further provide a comprehensive evaluation for predictive CI/CD with the data from real projects. Method: We conduct an experimental study that evaluates 67 CI/CD predictive models using both cross-validation and time-series-validation. Results: Our evaluation shows that cross-validation makes the evaluation of the models optimistic in most cases, there are a few counter-examples as well. The performance of the top 10 imbalanced models are better than the balanced models in the predictions of failed builds, even for balanced data. The degree of data imbalance has a negative impact on prediction performance. Conclusion: In research and practice, the assumptions of the various ML methods should be seriously considered for the validity of research. Even if it is used to compare the relative performance of models, cross-validation may not be applicable to the problems with time-series features. The research community need to revisit the evaluation results reported in some existing research.

查看原文本刊更多论文

CI/CD预测中不平衡学习和时间序列验证的实验评价

背景:机器学习(ML)作为支持软件工程(SE)的强大工具已被广泛使用。特定ML方法所需的数据特征的基本假设必须在SE应用之前仔细考虑。在持续集成(CI)和持续部署(CD)实践的背景下，在SE研究中有两个重要的数据特征容易被违背。首先，在CI/CD训练过程中产生的日志是不平衡数据，这与常见的平衡分类器的原则相反;其次，这些日志也是时间序列数据，这违反了交叉验证的假设。目的:系统研究这两种数据特征，并结合实际项目数据，进一步为预测性CI/CD提供综合评价。方法:采用交叉验证和时间序列验证的方法对67个CI/CD预测模型进行了实验研究。结果:我们的评估表明，交叉验证使得模型的评估在大多数情况下是乐观的，也有一些反例。在失败构建的预测中，前10个不平衡模型的性能优于平衡模型，即使对于平衡数据也是如此。数据不平衡的程度对预测性能有负面影响。结论:在研究和实践中，应认真考虑各种ML方法的假设，以保证研究的有效性。即使用于比较模型的相对性能，交叉验证也可能不适用于具有时间序列特征的问题。研究界需要重新审视一些现有研究报告的评价结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering

自引率

0.00%

发文量