Class Imbalance Learning to Heterogeneous Cross-Software Projects Defect Prediction

Rohit Vashisht, S. Rizvi
{"title":"Class Imbalance Learning to Heterogeneous Cross-Software Projects Defect Prediction","authors":"Rohit Vashisht, S. Rizvi","doi":"10.4018/ijsi.292021","DOIUrl":null,"url":null,"abstract":"Heterogeneous CPDP (HCPDP) attempts to forecast defects in a software application having insufficient previous defect data. Nonetheless, with a Class Imbalance Problem (CIP) perspective, one should have a clear view of data distribution in the training dataset otherwise the trained model would lead to biased classification results. Class Imbalance Learning (CIL) is the method of achieving an equilibrium ratio between two classes in imbalanced datasets. There are a range of effective solutions to manage CIP such as resampling techniques like Over-Sampling (OS) & Under-Sampling (US) methods. The proposed research work employs Synthetic Minority Oversampling TEchnique (SMOTE) and Random Under Sampling (RUS) technique to handle CIP. In addition to this, the paper proposes a novel four-phase HCPDP model and contrasts the efficiency of basic HCPDP model with CIP and after handling CIP using SMOTE & RUS with three prediction pairs. Results show that training performance with SMOTE is substantially improved but RUS displays variations in relation to HCPDP for all three prediction pairs.","PeriodicalId":396598,"journal":{"name":"Int. J. Softw. Innov.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Softw. Innov.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijsi.292021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Heterogeneous CPDP (HCPDP) attempts to forecast defects in a software application having insufficient previous defect data. Nonetheless, with a Class Imbalance Problem (CIP) perspective, one should have a clear view of data distribution in the training dataset otherwise the trained model would lead to biased classification results. Class Imbalance Learning (CIL) is the method of achieving an equilibrium ratio between two classes in imbalanced datasets. There are a range of effective solutions to manage CIP such as resampling techniques like Over-Sampling (OS) & Under-Sampling (US) methods. The proposed research work employs Synthetic Minority Oversampling TEchnique (SMOTE) and Random Under Sampling (RUS) technique to handle CIP. In addition to this, the paper proposes a novel four-phase HCPDP model and contrasts the efficiency of basic HCPDP model with CIP and after handling CIP using SMOTE & RUS with three prediction pairs. Results show that training performance with SMOTE is substantially improved but RUS displays variations in relation to HCPDP for all three prediction pairs.
类不平衡学习在异构跨软件项目缺陷预测中的应用
异构CPDP (HCPDP)试图在先前缺陷数据不足的情况下预测软件应用程序中的缺陷。尽管如此,从类不平衡问题(CIP)的角度来看,人们应该清楚地了解训练数据集中的数据分布,否则训练模型将导致有偏差的分类结果。类不平衡学习(Class Imbalance Learning, CIL)是一种在不平衡数据集中实现两个类之间均衡比例的方法。有一系列有效的解决方案来管理CIP,如重采样技术,如过采样(OS)和欠采样(US)方法。本研究采用合成少数派过采样技术(SMOTE)和随机欠采样技术(RUS)处理CIP。此外,本文提出了一种新的四阶段HCPDP模型,并将基本HCPDP模型与CIP模型以及使用SMOTE & RUS对三对预测对处理CIP后的HCPDP模型的效率进行了对比。结果表明,SMOTE的训练性能得到了很大的改善,但RUS在所有三个预测对中都表现出与HCPDP相关的变化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信