{"title":"过采样如何影响分类算法的性能?","authors":"Zhizheng Xiang, Yingying Xu, Zhenzhou Tang","doi":"10.1109/ISCC58397.2023.10218099","DOIUrl":null,"url":null,"abstract":"To address the issue of imbalanced datasets classification, this study explores how different oversampling algorithms and imbalance ratios affect the performance of classification algorithms. Two oversampling algorithms, including random oversampling and Synthetic Minority Oversampling Technique (SMOTE), are used to adjust the imbalance ratio of the training dataset to 999:1, 99:1, 9:1, 3:1, and 1:1. Four classification methods, including the Convolutional Neural Network, Vision Transformer, XGBoost and CatBoost, are evaluated using performance metrics such as precision, recall, AUC, and F2-Score. We conduct more than 240 experiments and observe that oversampling ratio has a significant positive impact on AUC and recall rate, but a negative impact on precision. The study also identifies the best oversampling algorithm and imbalance ratio for each classification algorithm. It is noteworthy that the Vision Transformer algorithm used in this study has not been employed in previous research on imbalanced data classification.","PeriodicalId":265337,"journal":{"name":"2023 IEEE Symposium on Computers and Communications (ISCC)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How Does Oversampling Affect the Performance of Classification Algorithms?\",\"authors\":\"Zhizheng Xiang, Yingying Xu, Zhenzhou Tang\",\"doi\":\"10.1109/ISCC58397.2023.10218099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To address the issue of imbalanced datasets classification, this study explores how different oversampling algorithms and imbalance ratios affect the performance of classification algorithms. Two oversampling algorithms, including random oversampling and Synthetic Minority Oversampling Technique (SMOTE), are used to adjust the imbalance ratio of the training dataset to 999:1, 99:1, 9:1, 3:1, and 1:1. Four classification methods, including the Convolutional Neural Network, Vision Transformer, XGBoost and CatBoost, are evaluated using performance metrics such as precision, recall, AUC, and F2-Score. We conduct more than 240 experiments and observe that oversampling ratio has a significant positive impact on AUC and recall rate, but a negative impact on precision. The study also identifies the best oversampling algorithm and imbalance ratio for each classification algorithm. It is noteworthy that the Vision Transformer algorithm used in this study has not been employed in previous research on imbalanced data classification.\",\"PeriodicalId\":265337,\"journal\":{\"name\":\"2023 IEEE Symposium on Computers and Communications (ISCC)\",\"volume\":\"92 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Symposium on Computers and Communications (ISCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC58397.2023.10218099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on Computers and Communications (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC58397.2023.10218099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
How Does Oversampling Affect the Performance of Classification Algorithms?
To address the issue of imbalanced datasets classification, this study explores how different oversampling algorithms and imbalance ratios affect the performance of classification algorithms. Two oversampling algorithms, including random oversampling and Synthetic Minority Oversampling Technique (SMOTE), are used to adjust the imbalance ratio of the training dataset to 999:1, 99:1, 9:1, 3:1, and 1:1. Four classification methods, including the Convolutional Neural Network, Vision Transformer, XGBoost and CatBoost, are evaluated using performance metrics such as precision, recall, AUC, and F2-Score. We conduct more than 240 experiments and observe that oversampling ratio has a significant positive impact on AUC and recall rate, but a negative impact on precision. The study also identifies the best oversampling algorithm and imbalance ratio for each classification algorithm. It is noteworthy that the Vision Transformer algorithm used in this study has not been employed in previous research on imbalanced data classification.