Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems

2011 11th International Conference on Intelligent Systems Design and Applications Pub Date : 2011-11-01 DOI:10.1109/ISDA.2011.6121689

S. Cateni, V. Colla, M. Vannucci

{"title":"Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems","authors":"S. Cateni, V. Colla, M. Vannucci","doi":"10.1109/ISDA.2011.6121689","DOIUrl":null,"url":null,"abstract":"The paper deals a novel resampling method in order to cope with imbalanced dataset in binary classification problems. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method combining an oversampling and an undersampling techniques is treated. In order to prove the effectiveness of the proposed approach, several tests have been developed. Two classifiers based on Support Vector Machine and Decision Tree have been designed, which are applied for binary classification on four datasets: a synthetic dataset, a widely used public dataset and two industrial datasets. The obtained results are presented and discussed in the paper; in particular, the performance that is achieved by the two classifiers through our resampling approach is compared to the ones that are obtained without any resampling and through the classical SMOTE approach, respectively.","PeriodicalId":433207,"journal":{"name":"2011 11th International Conference on Intelligent Systems Design and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th International Conference on Intelligent Systems Design and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2011.6121689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

The paper deals a novel resampling method in order to cope with imbalanced dataset in binary classification problems. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method combining an oversampling and an undersampling techniques is treated. In order to prove the effectiveness of the proposed approach, several tests have been developed. Two classifiers based on Support Vector Machine and Decision Tree have been designed, which are applied for binary classification on four datasets: a synthetic dataset, a widely used public dataset and two industrial datasets. The obtained results are presented and discussed in the paper; in particular, the performance that is achieved by the two classifiers through our resampling approach is compared to the ones that are obtained without any resampling and through the classical SMOTE approach, respectively.

查看原文本刊更多论文

用于工业和其他现实世界问题的不平衡数据集分类的新重采样方法

针对二值分类中数据不平衡的问题，提出了一种新的重采样方法。在许多工业应用中经常发现不平衡的数据集:例如，特定产品缺陷或机器故障的发生是非常罕见的事件，其检测至关重要。本文提出了一种结合过采样和欠采样技术的重采样方法。为了证明所提出的方法的有效性，已经开发了几个测试。设计了两个基于支持向量机和决策树的分类器，分别对一个合成数据集、一个广泛使用的公共数据集和两个工业数据集进行二值分类。本文对所得结果进行了介绍和讨论;特别是，通过我们的重新采样方法获得的两个分类器的性能分别与不进行任何重新采样和通过经典SMOTE方法获得的性能进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 11th International Conference on Intelligent Systems Design and Applications

自引率

0.00%

发文量