{"title":"Hybrid Oversampling Technique Based on Star Topology and Rejection Methodology for Classifying Imbalanced Data","authors":"Chaekyu Lee, Jaekwang Kim","doi":"10.1109/ICDMW58026.2022.00033","DOIUrl":null,"url":null,"abstract":"In this paper, we propose the star topology and rejection method (STARM), a new oversampling technique that generally performs well for varying data and algorithms. STARM is a hybrid technique that combines the advantages of Polynom-fit-SMOTE, LEE, and SMOTE, all of which have yielded high performance based on different technical features, and eliminates their disadvantages. To verify that the proposed technique exhibits high performance in general situations, we conducted 28,028 experiments to compare the predictive performance of 77 oversampling techniques with four machine learning algorithms for 91 imbalanced datasets of various types. Consequently, STARM yielded the highest performance on average among the 77 techniques. In addition, it showed excellent performance even in various algorithms, various imbalanced ratios, and various data volumes.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose the star topology and rejection method (STARM), a new oversampling technique that generally performs well for varying data and algorithms. STARM is a hybrid technique that combines the advantages of Polynom-fit-SMOTE, LEE, and SMOTE, all of which have yielded high performance based on different technical features, and eliminates their disadvantages. To verify that the proposed technique exhibits high performance in general situations, we conducted 28,028 experiments to compare the predictive performance of 77 oversampling techniques with four machine learning algorithms for 91 imbalanced datasets of various types. Consequently, STARM yielded the highest performance on average among the 77 techniques. In addition, it showed excellent performance even in various algorithms, various imbalanced ratios, and various data volumes.