What is the Impact of Imbalance on Software Defect Prediction Performance?

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering Pub Date : 2015-10-21 DOI:10.1145/2810146.2810150

Zaheed Mahmood, David Bowes, Peter Lane, T. Hall

{"title":"What is the Impact of Imbalance on Software Defect Prediction Performance?","authors":"Zaheed Mahmood, David Bowes, Peter Lane, T. Hall","doi":"10.1145/2810146.2810150","DOIUrl":null,"url":null,"abstract":"Software defect prediction performance varies over a large range. Menzies suggested there is a ceiling effect of 80% Recall [8]. Most of the data sets used are highly imbalanced. This paper asks, what is the empirical effect of using different datasets with varying levels of imbalance on predictive performance? We use data synthesised by a previous meta-analysis of 600 fault prediction models and their results. Four model evaluation measures (the Mathews Correlation Coefficient (MCC), F-Measure, Precision and Recall) are compared to the corresponding data imbalance ratio. When the data are imbalanced, the predictive performance of software defect prediction studies is low. As the data become more balanced, the predictive performance of prediction models increases, from an average MCC of 0.15, until the minority class makes up 20% of the instances in the dataset, where the MCC reaches an average value of about 0.34. As the proportion of the minority class increases above 20%, the predictive performance does not significantly increase. Using datasets with more than 20% of the instances being defective has not had a significant impact on the predictive performance when using MCC. We conclude that comparing the results of defect prediction studies should take into account the imbalance of the data.","PeriodicalId":189774,"journal":{"name":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2810146.2810150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

Software defect prediction performance varies over a large range. Menzies suggested there is a ceiling effect of 80% Recall [8]. Most of the data sets used are highly imbalanced. This paper asks, what is the empirical effect of using different datasets with varying levels of imbalance on predictive performance? We use data synthesised by a previous meta-analysis of 600 fault prediction models and their results. Four model evaluation measures (the Mathews Correlation Coefficient (MCC), F-Measure, Precision and Recall) are compared to the corresponding data imbalance ratio. When the data are imbalanced, the predictive performance of software defect prediction studies is low. As the data become more balanced, the predictive performance of prediction models increases, from an average MCC of 0.15, until the minority class makes up 20% of the instances in the dataset, where the MCC reaches an average value of about 0.34. As the proportion of the minority class increases above 20%, the predictive performance does not significantly increase. Using datasets with more than 20% of the instances being defective has not had a significant impact on the predictive performance when using MCC. We conclude that comparing the results of defect prediction studies should take into account the imbalance of the data.

查看原文本刊更多论文

不平衡对软件缺陷预测性能的影响是什么?

软件缺陷预测性能在很大的范围内变化。Menzies认为80%的回忆存在天花板效应[8]。大多数使用的数据集都是高度不平衡的。本文提出的问题是，使用不同程度不平衡的不同数据集对预测性能的实证影响是什么?我们使用的数据是由之前对600个故障预测模型及其结果的荟萃分析合成的。将马修斯相关系数(MCC)、F-Measure、精密度(Precision)和召回率(Recall)四种模型评价指标与相应的数据失衡率进行比较。当数据不平衡时，软件缺陷预测研究的预测性能较低。随着数据变得更加平衡，预测模型的预测性能也会提高，从平均MCC为0.15，直到少数类占数据集中实例的20%，此时MCC达到约0.34的平均值。当少数族裔班级的比例增加到20%以上时，预测性能并没有显著提高。当使用MCC时，使用超过20%的实例有缺陷的数据集对预测性能没有显著影响。我们的结论是，比较缺陷预测研究的结果应该考虑到数据的不平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering

自引率

0.00%

发文量