不平衡分类欠采样集成的广义优化嵌入框架

Hongjiao Guan, Yingtao Zhang, Bin Ma, Jian Li, Chun-peng Wang
{"title":"不平衡分类欠采样集成的广义优化嵌入框架","authors":"Hongjiao Guan, Yingtao Zhang, Bin Ma, Jian Li, Chun-peng Wang","doi":"10.1109/DSAA53316.2021.9564116","DOIUrl":null,"url":null,"abstract":"Imbalanced classification exists commonly in practical applications, and it has always been a challenging issue. Traditional classification methods have poor performance on imbalanced data, especially, on the minority class. However, the minority class is usually of our interest, and its misclassification cost is higher. The critical factor is the intrinsic complicated distribution characteristics in imbalanced data itself. Resampling ensemble learning achieves promising results and is a research focus recently. However, some resampling ensembles do not consider complicated distribution characteristics, thus limiting the performance improvement. In this paper, a generalized optimization embedded framework (GOEF) is proposed based on undersampling bagging. The GOEF aims to pay more attention to the learning of local regions to handle the complicated distribution characteristics. Specifically, the GOEF utilizes out-of-bag data to explore heterogeneous local areas and chooses misclassified examples to optimize base classifiers. The optimization can focus on a single class or both classes. Extensive experiments over synthetic and real datasets demonstrate that GOEF with the minority class optimization performs the best in terms of AUC, G-mean, and sensitivity, compared with five resampling ensemble methods.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification\",\"authors\":\"Hongjiao Guan, Yingtao Zhang, Bin Ma, Jian Li, Chun-peng Wang\",\"doi\":\"10.1109/DSAA53316.2021.9564116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced classification exists commonly in practical applications, and it has always been a challenging issue. Traditional classification methods have poor performance on imbalanced data, especially, on the minority class. However, the minority class is usually of our interest, and its misclassification cost is higher. The critical factor is the intrinsic complicated distribution characteristics in imbalanced data itself. Resampling ensemble learning achieves promising results and is a research focus recently. However, some resampling ensembles do not consider complicated distribution characteristics, thus limiting the performance improvement. In this paper, a generalized optimization embedded framework (GOEF) is proposed based on undersampling bagging. The GOEF aims to pay more attention to the learning of local regions to handle the complicated distribution characteristics. Specifically, the GOEF utilizes out-of-bag data to explore heterogeneous local areas and chooses misclassified examples to optimize base classifiers. The optimization can focus on a single class or both classes. Extensive experiments over synthetic and real datasets demonstrate that GOEF with the minority class optimization performs the best in terms of AUC, G-mean, and sensitivity, compared with five resampling ensemble methods.\",\"PeriodicalId\":129612,\"journal\":{\"name\":\"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSAA53316.2021.9564116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA53316.2021.9564116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

不平衡分类在实际应用中普遍存在,一直是一个具有挑战性的问题。传统的分类方法对不平衡数据,特别是少数类数据的分类效果较差。然而,少数类通常是我们感兴趣的,其误分类成本较高。关键因素是不平衡数据本身固有的复杂分布特征。重采样集成学习取得了良好的效果,是近年来的研究热点。然而,一些重采样集成没有考虑复杂的分布特征,从而限制了性能的提高。提出了一种基于欠采样装袋的广义优化嵌入式框架(GOEF)。GOEF旨在更多地关注局部区域的学习,以处理复杂的分布特征。具体来说,GOEF利用袋外数据来探索异质局部区域,并选择错误分类的例子来优化基本分类器。优化可以关注单个类,也可以关注两个类。在合成数据集和真实数据集上进行的大量实验表明,与五种重采样集合方法相比,具有少数类优化的GOEF在AUC、g均值和灵敏度方面表现最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification
Imbalanced classification exists commonly in practical applications, and it has always been a challenging issue. Traditional classification methods have poor performance on imbalanced data, especially, on the minority class. However, the minority class is usually of our interest, and its misclassification cost is higher. The critical factor is the intrinsic complicated distribution characteristics in imbalanced data itself. Resampling ensemble learning achieves promising results and is a research focus recently. However, some resampling ensembles do not consider complicated distribution characteristics, thus limiting the performance improvement. In this paper, a generalized optimization embedded framework (GOEF) is proposed based on undersampling bagging. The GOEF aims to pay more attention to the learning of local regions to handle the complicated distribution characteristics. Specifically, the GOEF utilizes out-of-bag data to explore heterogeneous local areas and chooses misclassified examples to optimize base classifiers. The optimization can focus on a single class or both classes. Extensive experiments over synthetic and real datasets demonstrate that GOEF with the minority class optimization performs the best in terms of AUC, G-mean, and sensitivity, compared with five resampling ensemble methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信