{"title":"Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems","authors":"S. Cateni, V. Colla, M. Vannucci","doi":"10.1109/ISDA.2011.6121689","DOIUrl":null,"url":null,"abstract":"The paper deals a novel resampling method in order to cope with imbalanced dataset in binary classification problems. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method combining an oversampling and an undersampling techniques is treated. In order to prove the effectiveness of the proposed approach, several tests have been developed. Two classifiers based on Support Vector Machine and Decision Tree have been designed, which are applied for binary classification on four datasets: a synthetic dataset, a widely used public dataset and two industrial datasets. The obtained results are presented and discussed in the paper; in particular, the performance that is achieved by the two classifiers through our resampling approach is compared to the ones that are obtained without any resampling and through the classical SMOTE approach, respectively.","PeriodicalId":433207,"journal":{"name":"2011 11th International Conference on Intelligent Systems Design and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th International Conference on Intelligent Systems Design and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2011.6121689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
The paper deals a novel resampling method in order to cope with imbalanced dataset in binary classification problems. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method combining an oversampling and an undersampling techniques is treated. In order to prove the effectiveness of the proposed approach, several tests have been developed. Two classifiers based on Support Vector Machine and Decision Tree have been designed, which are applied for binary classification on four datasets: a synthetic dataset, a widely used public dataset and two industrial datasets. The obtained results are presented and discussed in the paper; in particular, the performance that is achieved by the two classifiers through our resampling approach is compared to the ones that are obtained without any resampling and through the classical SMOTE approach, respectively.