Building an Ensemble for Software Defect Prediction Based on Diversity Selection

Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo
{"title":"Building an Ensemble for Software Defect Prediction Based on Diversity Selection","authors":"Jean Petrić, David Bowes, T. Hall, B. Christianson, Nathan Baddoo","doi":"10.1145/2961111.2962610","DOIUrl":null,"url":null,"abstract":"Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.","PeriodicalId":208212,"journal":{"name":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2961111.2962610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 52

Abstract

Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.
基于多样性选择的软件缺陷预测集成构建
背景:集成技术在各个科学领域得到了广泛的关注。缺陷预测研究人员已经研究了许多最先进的集成模型,并得出结论,在许多情况下,这些模型优于标准的单一分类器技术。几乎所有先前使用集成技术进行缺陷预测的工作都依赖于多数投票方案来组合预测输出,以及单个分类器之间的隐式多样性。目的:考虑到不同的分类器识别不同的缺陷集,研究是否可以使用带有堆叠集成的显式多样性技术来改进缺陷预测。方法:采用四科分类器和加权精度多样性(WAD)技术来挖掘分类器之间的多样性。为了结合单个预测,我们使用了堆叠集成技术。我们在软件缺陷预测中使用最先进的知识来构建我们的集成模型,并针对8个公开可用的数据集测试了它们的预测能力。结论:与其他缺陷预测模型相比,使用堆叠集成模型的性能有所提高。用于建筑集成的分类器之间的多样性对于实现这些性能改进至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信