CFIWSE: A Hybrid Preprocessing Approach for Defect Prediction on Imbalance Real-World Datasets

Jiaxi Xu, Jingwei Shang, Zhichang Huang
{"title":"CFIWSE: A Hybrid Preprocessing Approach for Defect Prediction on Imbalance Real-World Datasets","authors":"Jiaxi Xu, Jingwei Shang, Zhichang Huang","doi":"10.1109/QRS-C57518.2022.00064","DOIUrl":null,"url":null,"abstract":"Software Defect Prediction (SDP) predicts new defects through machine learning trained with historical defect data. The distribution of software defects is highly unbalanced, which hinders the construction of defect prediction models. In addition, previous studies were usually validated by public datasets based on code metrics instead of real-world data. In this work, SNA metrics and code metrics are computed on 9 representative real-world projects. A hybrid preprocessing approach for defect prediction named CFIWSE is proposed to improve SDP performance through feature selection, minority sample synthesis and noise reduction, consisting of CFS and IWSE. CFS uses correlation analysis and nearest neighbor theory for feature selection. IWSE utilizes information weights and edited nearest neighbor rule to alleviate overfitting and noise introduced from minority sample synthesis. The proposed method is verified by experiments on real-world data, and the contribution of the method components and parameter sensitivity are explored.","PeriodicalId":183728,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS-C57518.2022.00064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Software Defect Prediction (SDP) predicts new defects through machine learning trained with historical defect data. The distribution of software defects is highly unbalanced, which hinders the construction of defect prediction models. In addition, previous studies were usually validated by public datasets based on code metrics instead of real-world data. In this work, SNA metrics and code metrics are computed on 9 representative real-world projects. A hybrid preprocessing approach for defect prediction named CFIWSE is proposed to improve SDP performance through feature selection, minority sample synthesis and noise reduction, consisting of CFS and IWSE. CFS uses correlation analysis and nearest neighbor theory for feature selection. IWSE utilizes information weights and edited nearest neighbor rule to alleviate overfitting and noise introduced from minority sample synthesis. The proposed method is verified by experiments on real-world data, and the contribution of the method components and parameter sensitivity are explored.
CFIWSE:一种用于不平衡真实数据集缺陷预测的混合预处理方法
软件缺陷预测(SDP)通过对历史缺陷数据进行训练的机器学习来预测新的缺陷。软件缺陷的分布高度不平衡,阻碍了缺陷预测模型的建立。此外,以前的研究通常是通过基于代码度量的公共数据集来验证的,而不是真实世界的数据。在这项工作中,SNA度量和代码度量是在9个具有代表性的实际项目上计算的。提出了一种缺陷预测的混合预处理方法CFIWSE,通过特征选择、少数样本合成和降噪来提高缺陷预测的性能,该方法由CFS和IWSE组成。CFS采用相关分析和最近邻理论进行特征选择。IWSE利用信息权重和编辑近邻规则来缓解少数样本合成带来的过拟合和噪声。通过实际数据实验验证了该方法的有效性,并探讨了方法各分量和参数灵敏度的贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信