重新采样策略在软件缺陷预测中的应用

Lourdes Pelayo, S. Dick
{"title":"重新采样策略在软件缺陷预测中的应用","authors":"Lourdes Pelayo, S. Dick","doi":"10.1109/NAFIPS.2007.383813","DOIUrl":null,"url":null,"abstract":"Due to the tremendous complexity and sophistication of software, improving software reliability is an enormously difficult task. We study the software defect prediction problem, which focuses on predicting which modules will experience a failure during operation. Numerous studies have applied machine learning to software defect prediction; however, skewness in defect-prediction datasets usually undermines the learning algorithms. The resulting classifiers will often never predict the faulty minority class. This problem is well known in machine learning and is often referred to as learning from unbalanced datasets. We examine stratification, a widely used technique for learning unbalanced data that has received little attention in software defect prediction. Our experiments are focused on the SMOTE technique, which is a method of over-sampling minority-class examples. Our goal is to determine if SMOTE can improve recognition of defect-prone modules, and at what cost. Our experiments demonstrate that after SMOTE resampling, we have a more balanced classification. We found an improvement of at least 23% in the average geometric mean classification accuracy on four benchmark datasets.","PeriodicalId":292853,"journal":{"name":"NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society","volume":"206 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"136","resultStr":"{\"title\":\"Applying Novel Resampling Strategies To Software Defect Prediction\",\"authors\":\"Lourdes Pelayo, S. Dick\",\"doi\":\"10.1109/NAFIPS.2007.383813\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the tremendous complexity and sophistication of software, improving software reliability is an enormously difficult task. We study the software defect prediction problem, which focuses on predicting which modules will experience a failure during operation. Numerous studies have applied machine learning to software defect prediction; however, skewness in defect-prediction datasets usually undermines the learning algorithms. The resulting classifiers will often never predict the faulty minority class. This problem is well known in machine learning and is often referred to as learning from unbalanced datasets. We examine stratification, a widely used technique for learning unbalanced data that has received little attention in software defect prediction. Our experiments are focused on the SMOTE technique, which is a method of over-sampling minority-class examples. Our goal is to determine if SMOTE can improve recognition of defect-prone modules, and at what cost. Our experiments demonstrate that after SMOTE resampling, we have a more balanced classification. We found an improvement of at least 23% in the average geometric mean classification accuracy on four benchmark datasets.\",\"PeriodicalId\":292853,\"journal\":{\"name\":\"NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society\",\"volume\":\"206 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"136\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAFIPS.2007.383813\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2007.383813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 136

摘要

由于软件的巨大复杂性和复杂性,提高软件的可靠性是一项极其困难的任务。我们研究了软件缺陷预测问题,主要是预测哪些模块在运行过程中会出现故障。许多研究将机器学习应用于软件缺陷预测;然而,缺陷预测数据集的偏性通常会破坏学习算法。最终的分类器通常无法预测有缺陷的少数类。这个问题在机器学习中是众所周知的,通常被称为从不平衡数据集中学习。我们的实验集中在SMOTE技术上,这是一种对少数类样本进行过度采样的方法。我们的目标是确定SMOTE是否可以改进对容易出现缺陷的模块的识别,以及代价是什么。我们的实验表明,SMOTE重采样后,我们有一个更平衡的分类。我们发现在四个基准数据集上,平均几何平均分类精度至少提高了23%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Applying Novel Resampling Strategies To Software Defect Prediction
Due to the tremendous complexity and sophistication of software, improving software reliability is an enormously difficult task. We study the software defect prediction problem, which focuses on predicting which modules will experience a failure during operation. Numerous studies have applied machine learning to software defect prediction; however, skewness in defect-prediction datasets usually undermines the learning algorithms. The resulting classifiers will often never predict the faulty minority class. This problem is well known in machine learning and is often referred to as learning from unbalanced datasets. We examine stratification, a widely used technique for learning unbalanced data that has received little attention in software defect prediction. Our experiments are focused on the SMOTE technique, which is a method of over-sampling minority-class examples. Our goal is to determine if SMOTE can improve recognition of defect-prone modules, and at what cost. Our experiments demonstrate that after SMOTE resampling, we have a more balanced classification. We found an improvement of at least 23% in the average geometric mean classification accuracy on four benchmark datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信