基于不同机器学习技术的软件缺陷预测模型中的类不平衡问题的实证研究

Sushant Kumar Pandey, A. Tripathi
{"title":"基于不同机器学习技术的软件缺陷预测模型中的类不平衡问题的实证研究","authors":"Sushant Kumar Pandey, A. Tripathi","doi":"10.1109/ICSCC51209.2021.9528170","DOIUrl":null,"url":null,"abstract":"Software practitioners are continuing to build advanced software defect prediction (SDP) models to help the tester find fault-prone modules. However, the Class Imbalance (CI) problem consists of uncommonly few defective instances, and more non-defective instances cause inconsistency in the performance. We have conducted 880 experiments to analyze the variation in the performance of 10 SDP models by concerning the class imbalance problem. In our experiments, we have used 22 public datasets consists of 41 software metrics, 10 baseline SDP methods, and 4 sampling techniques. We used Mathews Correlation Coefficient (MCC), which is more useful when a dataset is highly imbalanced. We have also compared the predictive performance of various ML models by applying 4 sampling techniques. To examine the performance of different SDP models, we have used the F-measure. We found the performance of the learning models is unsatisfactory, which needs to mitigate. We have also found a few surprising results, some logical patterns between classifier and sampling technique. It provides a connection between sampling technique, software matrices, and a classifier.","PeriodicalId":382982,"journal":{"name":"2021 8th International Conference on Smart Computing and Communications (ICSCC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study\",\"authors\":\"Sushant Kumar Pandey, A. Tripathi\",\"doi\":\"10.1109/ICSCC51209.2021.9528170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software practitioners are continuing to build advanced software defect prediction (SDP) models to help the tester find fault-prone modules. However, the Class Imbalance (CI) problem consists of uncommonly few defective instances, and more non-defective instances cause inconsistency in the performance. We have conducted 880 experiments to analyze the variation in the performance of 10 SDP models by concerning the class imbalance problem. In our experiments, we have used 22 public datasets consists of 41 software metrics, 10 baseline SDP methods, and 4 sampling techniques. We used Mathews Correlation Coefficient (MCC), which is more useful when a dataset is highly imbalanced. We have also compared the predictive performance of various ML models by applying 4 sampling techniques. To examine the performance of different SDP models, we have used the F-measure. We found the performance of the learning models is unsatisfactory, which needs to mitigate. We have also found a few surprising results, some logical patterns between classifier and sampling technique. It provides a connection between sampling technique, software matrices, and a classifier.\",\"PeriodicalId\":382982,\"journal\":{\"name\":\"2021 8th International Conference on Smart Computing and Communications (ICSCC)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th International Conference on Smart Computing and Communications (ICSCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSCC51209.2021.9528170\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Smart Computing and Communications (ICSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSCC51209.2021.9528170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

软件从业者正在继续构建高级软件缺陷预测(SDP)模型,以帮助测试人员找到容易出错的模块。然而,类不平衡(Class Imbalance, CI)问题通常由很少的缺陷实例组成,而更多的非缺陷实例会导致性能不一致。我们进行了880次实验,分析了10个SDP模型在类不平衡问题下的性能变化。在我们的实验中,我们使用了22个公共数据集,包括41个软件指标,10个基线SDP方法和4个采样技术。我们使用了马修斯相关系数(MCC),当数据集高度不平衡时,它更有用。我们还通过应用4种采样技术比较了各种ML模型的预测性能。为了检验不同SDP模型的性能,我们使用了f度量。我们发现学习模型的性能并不令人满意,需要改进。我们还发现了一些令人惊讶的结果,分类器和抽样技术之间的一些逻辑模式。它提供了采样技术、软件矩阵和分类器之间的联系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study
Software practitioners are continuing to build advanced software defect prediction (SDP) models to help the tester find fault-prone modules. However, the Class Imbalance (CI) problem consists of uncommonly few defective instances, and more non-defective instances cause inconsistency in the performance. We have conducted 880 experiments to analyze the variation in the performance of 10 SDP models by concerning the class imbalance problem. In our experiments, we have used 22 public datasets consists of 41 software metrics, 10 baseline SDP methods, and 4 sampling techniques. We used Mathews Correlation Coefficient (MCC), which is more useful when a dataset is highly imbalanced. We have also compared the predictive performance of various ML models by applying 4 sampling techniques. To examine the performance of different SDP models, we have used the F-measure. We found the performance of the learning models is unsatisfactory, which needs to mitigate. We have also found a few surprising results, some logical patterns between classifier and sampling technique. It provides a connection between sampling technique, software matrices, and a classifier.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信