Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction

George G. Cabral, Leandro L. Minku, Emad Shihab, Suhaib Mujahid
{"title":"Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction","authors":"George G. Cabral, Leandro L. Minku, Emad Shihab, Suhaib Mujahid","doi":"10.1109/ICSE.2019.00076","DOIUrl":null,"url":null,"abstract":"Just-in-Time Software Defect Prediction (JIT-SDP) is an SDP approach that makes defect predictions at the software change level. Most existing JIT-SDP work assumes that the characteristics of the problem remain the same over time. However, JIT-SDP may suffer from class imbalance evolution. Specifically, the imbalance status of the problem (i.e., how much underrepresented the defect-inducing changes are) may be intensified or reduced over time. If occurring, this could render existing JIT-SDP approaches unsuitable, including those that re-build classifiers over time using only recent data. This work thus provides the first investigation of whether class imbalance evolution poses a threat to JIT-SDP. This investigation is performed in a realistic scenario by taking into account verification latency -- the often overlooked fact that labeled training examples arrive with a delay. Based on 10 GitHub projects, we show that JIT-SDP suffers from class imbalance evolution, significantly hindering the predictive performance of existing JIT-SDP approaches. Compared to state-of-the-art class imbalance evolution learning approaches, the predictive performance of JIT-SDP approaches was up to 97.2% lower in terms of g-mean. Hence, it is essential to tackle class imbalance evolution in JIT-SDP. We then propose a novel class imbalance evolution approach for the specific context of JIT-SDP. While maintaining top ranked g-means, this approach managed to produce up to 63.59% more balanced recalls on the defect-inducing and clean classes than state-of-the-art class imbalance evolution approaches. We thus recommend it to avoid overemphasizing one class over the other in JIT-SDP.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"99 1","pages":"666-676"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 56

Abstract

Just-in-Time Software Defect Prediction (JIT-SDP) is an SDP approach that makes defect predictions at the software change level. Most existing JIT-SDP work assumes that the characteristics of the problem remain the same over time. However, JIT-SDP may suffer from class imbalance evolution. Specifically, the imbalance status of the problem (i.e., how much underrepresented the defect-inducing changes are) may be intensified or reduced over time. If occurring, this could render existing JIT-SDP approaches unsuitable, including those that re-build classifiers over time using only recent data. This work thus provides the first investigation of whether class imbalance evolution poses a threat to JIT-SDP. This investigation is performed in a realistic scenario by taking into account verification latency -- the often overlooked fact that labeled training examples arrive with a delay. Based on 10 GitHub projects, we show that JIT-SDP suffers from class imbalance evolution, significantly hindering the predictive performance of existing JIT-SDP approaches. Compared to state-of-the-art class imbalance evolution learning approaches, the predictive performance of JIT-SDP approaches was up to 97.2% lower in terms of g-mean. Hence, it is essential to tackle class imbalance evolution in JIT-SDP. We then propose a novel class imbalance evolution approach for the specific context of JIT-SDP. While maintaining top ranked g-means, this approach managed to produce up to 63.59% more balanced recalls on the defect-inducing and clean classes than state-of-the-art class imbalance evolution approaches. We thus recommend it to avoid overemphasizing one class over the other in JIT-SDP.
实时软件缺陷预测中的类不平衡演化与验证延迟
即时软件缺陷预测(JIT-SDP)是一种在软件变更级别进行缺陷预测的SDP方法。大多数现有的JIT-SDP工作都假定问题的特征随着时间的推移保持不变。然而,JIT-SDP可能会遭受阶级不平衡的演变。具体地说,问题的不平衡状态(即,引起缺陷的变化有多少未被充分代表)可能随着时间的推移而加剧或减少。如果发生这种情况,可能会使现有的JIT-SDP方法变得不合适,包括那些只使用最近的数据随时间重新构建分类器的方法。因此,这项工作提供了阶级不平衡进化是否对JIT-SDP构成威胁的首次调查。这项调查是在考虑验证延迟的现实场景中进行的——这是一个经常被忽视的事实,即标记的训练示例会延迟到达。基于10个GitHub项目,我们发现JIT-SDP存在类不平衡进化,严重阻碍了现有JIT-SDP方法的预测性能。与最先进的类失衡进化学习方法相比,JIT-SDP方法的预测性能在g-mean方面降低了97.2%。因此,解决JIT-SDP中的阶级不平衡演变问题是至关重要的。然后,我们针对JIT-SDP的具体情况提出了一种新的类不平衡演化方法。在保持顶级g均值的同时,该方法在诱导缺陷和清洁类上的平衡召回比最先进的类不平衡进化方法多出63.59%。因此,我们建议避免在JIT-SDP中过分强调一个类而不是另一个类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信