Huajuan Ren, Shuaimin Ren, Lin Yan, Ruimin Wang, Jing Jing, Jiaqi Shi
{"title":"基于聚类的欠采样集成方法用于不平衡分类","authors":"Huajuan Ren, Shuaimin Ren, Lin Yan, Ruimin Wang, Jing Jing, Jiaqi Shi","doi":"10.1109/ICAIE53562.2021.00140","DOIUrl":null,"url":null,"abstract":"Class imbalance widely occurs in many real-world applications, which affects the recognition of important class to a certain extent. Ensemble methods that combined resampling are effective to alleviate the class imbalance problems. This paper presents a new ensemble approach with clustering-based under-sampling, called CNBoost, for learning from imbalanced data. This algorithm is based on the combination of Centers NN and boosting procedure. Centers NN, as an under-sampling utilizing the nearest neighbors of cluster centers, is used to provide a new training subset in each iteration of boosting, which makes the base learner learn the overall data distribution in each iteration of boosting. We compared the performance of the proposed algorithm with 3 popular ensemble methods. Out of 10 datasets and 3 measurements, CNBoost performs equally well or better than the other 3 methods in 25/30 categories. In addition, we discussed the effect of the base learner used in boosting on the performance of these algorithms. The results show that CNBoost is a promising approach with high classification accuracy and stability for dealing with imbalanced datasets.","PeriodicalId":285278,"journal":{"name":"2021 2nd International Conference on Artificial Intelligence and Education (ICAIE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Ensemble Approach with Clustering-Based Under-sampling for Imbalanced Classification\",\"authors\":\"Huajuan Ren, Shuaimin Ren, Lin Yan, Ruimin Wang, Jing Jing, Jiaqi Shi\",\"doi\":\"10.1109/ICAIE53562.2021.00140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Class imbalance widely occurs in many real-world applications, which affects the recognition of important class to a certain extent. Ensemble methods that combined resampling are effective to alleviate the class imbalance problems. This paper presents a new ensemble approach with clustering-based under-sampling, called CNBoost, for learning from imbalanced data. This algorithm is based on the combination of Centers NN and boosting procedure. Centers NN, as an under-sampling utilizing the nearest neighbors of cluster centers, is used to provide a new training subset in each iteration of boosting, which makes the base learner learn the overall data distribution in each iteration of boosting. We compared the performance of the proposed algorithm with 3 popular ensemble methods. Out of 10 datasets and 3 measurements, CNBoost performs equally well or better than the other 3 methods in 25/30 categories. In addition, we discussed the effect of the base learner used in boosting on the performance of these algorithms. The results show that CNBoost is a promising approach with high classification accuracy and stability for dealing with imbalanced datasets.\",\"PeriodicalId\":285278,\"journal\":{\"name\":\"2021 2nd International Conference on Artificial Intelligence and Education (ICAIE)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 2nd International Conference on Artificial Intelligence and Education (ICAIE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIE53562.2021.00140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on Artificial Intelligence and Education (ICAIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIE53562.2021.00140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Ensemble Approach with Clustering-Based Under-sampling for Imbalanced Classification
Class imbalance widely occurs in many real-world applications, which affects the recognition of important class to a certain extent. Ensemble methods that combined resampling are effective to alleviate the class imbalance problems. This paper presents a new ensemble approach with clustering-based under-sampling, called CNBoost, for learning from imbalanced data. This algorithm is based on the combination of Centers NN and boosting procedure. Centers NN, as an under-sampling utilizing the nearest neighbors of cluster centers, is used to provide a new training subset in each iteration of boosting, which makes the base learner learn the overall data distribution in each iteration of boosting. We compared the performance of the proposed algorithm with 3 popular ensemble methods. Out of 10 datasets and 3 measurements, CNBoost performs equally well or better than the other 3 methods in 25/30 categories. In addition, we discussed the effect of the base learner used in boosting on the performance of these algorithms. The results show that CNBoost is a promising approach with high classification accuracy and stability for dealing with imbalanced datasets.