{"title":"How Does Oversampling Affect the Performance of Classification Algorithms?","authors":"Zhizheng Xiang, Yingying Xu, Zhenzhou Tang","doi":"10.1109/ISCC58397.2023.10218099","DOIUrl":null,"url":null,"abstract":"To address the issue of imbalanced datasets classification, this study explores how different oversampling algorithms and imbalance ratios affect the performance of classification algorithms. Two oversampling algorithms, including random oversampling and Synthetic Minority Oversampling Technique (SMOTE), are used to adjust the imbalance ratio of the training dataset to 999:1, 99:1, 9:1, 3:1, and 1:1. Four classification methods, including the Convolutional Neural Network, Vision Transformer, XGBoost and CatBoost, are evaluated using performance metrics such as precision, recall, AUC, and F2-Score. We conduct more than 240 experiments and observe that oversampling ratio has a significant positive impact on AUC and recall rate, but a negative impact on precision. The study also identifies the best oversampling algorithm and imbalance ratio for each classification algorithm. It is noteworthy that the Vision Transformer algorithm used in this study has not been employed in previous research on imbalanced data classification.","PeriodicalId":265337,"journal":{"name":"2023 IEEE Symposium on Computers and Communications (ISCC)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on Computers and Communications (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC58397.2023.10218099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
To address the issue of imbalanced datasets classification, this study explores how different oversampling algorithms and imbalance ratios affect the performance of classification algorithms. Two oversampling algorithms, including random oversampling and Synthetic Minority Oversampling Technique (SMOTE), are used to adjust the imbalance ratio of the training dataset to 999:1, 99:1, 9:1, 3:1, and 1:1. Four classification methods, including the Convolutional Neural Network, Vision Transformer, XGBoost and CatBoost, are evaluated using performance metrics such as precision, recall, AUC, and F2-Score. We conduct more than 240 experiments and observe that oversampling ratio has a significant positive impact on AUC and recall rate, but a negative impact on precision. The study also identifies the best oversampling algorithm and imbalance ratio for each classification algorithm. It is noteworthy that the Vision Transformer algorithm used in this study has not been employed in previous research on imbalanced data classification.