Hongjiao Guan, Yingtao Zhang, Bin Ma, Jian Li, Chun-peng Wang
{"title":"不平衡分类欠采样集成的广义优化嵌入框架","authors":"Hongjiao Guan, Yingtao Zhang, Bin Ma, Jian Li, Chun-peng Wang","doi":"10.1109/DSAA53316.2021.9564116","DOIUrl":null,"url":null,"abstract":"Imbalanced classification exists commonly in practical applications, and it has always been a challenging issue. Traditional classification methods have poor performance on imbalanced data, especially, on the minority class. However, the minority class is usually of our interest, and its misclassification cost is higher. The critical factor is the intrinsic complicated distribution characteristics in imbalanced data itself. Resampling ensemble learning achieves promising results and is a research focus recently. However, some resampling ensembles do not consider complicated distribution characteristics, thus limiting the performance improvement. In this paper, a generalized optimization embedded framework (GOEF) is proposed based on undersampling bagging. The GOEF aims to pay more attention to the learning of local regions to handle the complicated distribution characteristics. Specifically, the GOEF utilizes out-of-bag data to explore heterogeneous local areas and chooses misclassified examples to optimize base classifiers. The optimization can focus on a single class or both classes. Extensive experiments over synthetic and real datasets demonstrate that GOEF with the minority class optimization performs the best in terms of AUC, G-mean, and sensitivity, compared with five resampling ensemble methods.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification\",\"authors\":\"Hongjiao Guan, Yingtao Zhang, Bin Ma, Jian Li, Chun-peng Wang\",\"doi\":\"10.1109/DSAA53316.2021.9564116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced classification exists commonly in practical applications, and it has always been a challenging issue. Traditional classification methods have poor performance on imbalanced data, especially, on the minority class. However, the minority class is usually of our interest, and its misclassification cost is higher. The critical factor is the intrinsic complicated distribution characteristics in imbalanced data itself. Resampling ensemble learning achieves promising results and is a research focus recently. However, some resampling ensembles do not consider complicated distribution characteristics, thus limiting the performance improvement. In this paper, a generalized optimization embedded framework (GOEF) is proposed based on undersampling bagging. The GOEF aims to pay more attention to the learning of local regions to handle the complicated distribution characteristics. Specifically, the GOEF utilizes out-of-bag data to explore heterogeneous local areas and chooses misclassified examples to optimize base classifiers. The optimization can focus on a single class or both classes. Extensive experiments over synthetic and real datasets demonstrate that GOEF with the minority class optimization performs the best in terms of AUC, G-mean, and sensitivity, compared with five resampling ensemble methods.\",\"PeriodicalId\":129612,\"journal\":{\"name\":\"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSAA53316.2021.9564116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA53316.2021.9564116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification
Imbalanced classification exists commonly in practical applications, and it has always been a challenging issue. Traditional classification methods have poor performance on imbalanced data, especially, on the minority class. However, the minority class is usually of our interest, and its misclassification cost is higher. The critical factor is the intrinsic complicated distribution characteristics in imbalanced data itself. Resampling ensemble learning achieves promising results and is a research focus recently. However, some resampling ensembles do not consider complicated distribution characteristics, thus limiting the performance improvement. In this paper, a generalized optimization embedded framework (GOEF) is proposed based on undersampling bagging. The GOEF aims to pay more attention to the learning of local regions to handle the complicated distribution characteristics. Specifically, the GOEF utilizes out-of-bag data to explore heterogeneous local areas and chooses misclassified examples to optimize base classifiers. The optimization can focus on a single class or both classes. Extensive experiments over synthetic and real datasets demonstrate that GOEF with the minority class optimization performs the best in terms of AUC, G-mean, and sensitivity, compared with five resampling ensemble methods.