Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems

S. Cateni, V. Colla, M. Vannucci
{"title":"Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems","authors":"S. Cateni, V. Colla, M. Vannucci","doi":"10.1109/ISDA.2011.6121689","DOIUrl":null,"url":null,"abstract":"The paper deals a novel resampling method in order to cope with imbalanced dataset in binary classification problems. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method combining an oversampling and an undersampling techniques is treated. In order to prove the effectiveness of the proposed approach, several tests have been developed. Two classifiers based on Support Vector Machine and Decision Tree have been designed, which are applied for binary classification on four datasets: a synthetic dataset, a widely used public dataset and two industrial datasets. The obtained results are presented and discussed in the paper; in particular, the performance that is achieved by the two classifiers through our resampling approach is compared to the ones that are obtained without any resampling and through the classical SMOTE approach, respectively.","PeriodicalId":433207,"journal":{"name":"2011 11th International Conference on Intelligent Systems Design and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th International Conference on Intelligent Systems Design and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2011.6121689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

The paper deals a novel resampling method in order to cope with imbalanced dataset in binary classification problems. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method combining an oversampling and an undersampling techniques is treated. In order to prove the effectiveness of the proposed approach, several tests have been developed. Two classifiers based on Support Vector Machine and Decision Tree have been designed, which are applied for binary classification on four datasets: a synthetic dataset, a widely used public dataset and two industrial datasets. The obtained results are presented and discussed in the paper; in particular, the performance that is achieved by the two classifiers through our resampling approach is compared to the ones that are obtained without any resampling and through the classical SMOTE approach, respectively.
用于工业和其他现实世界问题的不平衡数据集分类的新重采样方法
针对二值分类中数据不平衡的问题,提出了一种新的重采样方法。在许多工业应用中经常发现不平衡的数据集:例如,特定产品缺陷或机器故障的发生是非常罕见的事件,其检测至关重要。本文提出了一种结合过采样和欠采样技术的重采样方法。为了证明所提出的方法的有效性,已经开发了几个测试。设计了两个基于支持向量机和决策树的分类器,分别对一个合成数据集、一个广泛使用的公共数据集和两个工业数据集进行二值分类。本文对所得结果进行了介绍和讨论;特别是,通过我们的重新采样方法获得的两个分类器的性能分别与不进行任何重新采样和通过经典SMOTE方法获得的性能进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信