结合极限学习机和改进SMOTE算法的不平衡数据在线顺序分类

Wentao Mao, Jinwan Wang, Liyun Wang
{"title":"结合极限学习机和改进SMOTE算法的不平衡数据在线顺序分类","authors":"Wentao Mao, Jinwan Wang, Liyun Wang","doi":"10.1109/IJCNN.2015.7280620","DOIUrl":null,"url":null,"abstract":"Presently, the data imbalance problems become more pronounced in the applications of machine learning and pattern recognition. However, many traditional machine learning methods suffer from the imbalanced data which are also collected in online sequential manner. To get fast and efficient classification for this special problem, a new online sequential extreme learning machine method with sequential SMOTE strategy is proposed. The key idea of this method is to reduce the randomness while generating virtual minority samples by means of the distribution characteristic of online sequential data. Utilizing online-sequential extreme learning machine as baseline algorithm, this method contains two stages. In offline stage, principal curve is introduced to model the each class's distribution based on which some virtual samples are generated by synthetic minority over-sampling technique(SMOTE). In online stage, each class's membership is determined according to the projection distance of sample to principal curve. With the help of these memberships, the redundant majority samples as well as unreasonable virtual minority samples are all excluded to lighten the imbalance level in online stage. The proposed method is evaluated on four UCI datasets and the real-world air pollutant forecasting dataset. The experimental results show that, the proposed method outperforms the classical ELM, OS-ELM and SMOTE-based OS-ELM in terms of generalization performance and numerical stability.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"18 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Online sequential classification of imbalanced data by combining extreme learning machine and improved SMOTE algorithm\",\"authors\":\"Wentao Mao, Jinwan Wang, Liyun Wang\",\"doi\":\"10.1109/IJCNN.2015.7280620\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Presently, the data imbalance problems become more pronounced in the applications of machine learning and pattern recognition. However, many traditional machine learning methods suffer from the imbalanced data which are also collected in online sequential manner. To get fast and efficient classification for this special problem, a new online sequential extreme learning machine method with sequential SMOTE strategy is proposed. The key idea of this method is to reduce the randomness while generating virtual minority samples by means of the distribution characteristic of online sequential data. Utilizing online-sequential extreme learning machine as baseline algorithm, this method contains two stages. In offline stage, principal curve is introduced to model the each class's distribution based on which some virtual samples are generated by synthetic minority over-sampling technique(SMOTE). In online stage, each class's membership is determined according to the projection distance of sample to principal curve. With the help of these memberships, the redundant majority samples as well as unreasonable virtual minority samples are all excluded to lighten the imbalance level in online stage. The proposed method is evaluated on four UCI datasets and the real-world air pollutant forecasting dataset. The experimental results show that, the proposed method outperforms the classical ELM, OS-ELM and SMOTE-based OS-ELM in terms of generalization performance and numerical stability.\",\"PeriodicalId\":6539,\"journal\":{\"name\":\"2015 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"18 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2015.7280620\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

目前,在机器学习和模式识别的应用中,数据不平衡问题日益突出。然而,许多传统的机器学习方法存在数据不平衡的问题,这些数据也是在线顺序收集的。为了对这一特殊问题进行快速有效的分类,提出了一种基于顺序SMOTE策略的在线顺序极限学习机方法。该方法的核心思想是利用在线序列数据的分布特性,在生成虚拟少数样本的同时降低随机性。该方法采用在线顺序极值学习机作为基准算法,分为两个阶段。在离线阶段,引入主曲线对各个类别的分布进行建模,并在此基础上利用合成少数派过采样技术生成虚拟样本。在在线阶段,根据样本到主曲线的投影距离确定每个类的隶属度。借助这些隶属度,排除了冗余的多数样本和不合理的虚拟少数样本,减轻了在线阶段的不平衡程度。在四个UCI数据集和实际空气污染物预测数据集上对该方法进行了评估。实验结果表明,该方法在泛化性能和数值稳定性方面均优于经典ELM、OS-ELM和基于smote的OS-ELM。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Online sequential classification of imbalanced data by combining extreme learning machine and improved SMOTE algorithm
Presently, the data imbalance problems become more pronounced in the applications of machine learning and pattern recognition. However, many traditional machine learning methods suffer from the imbalanced data which are also collected in online sequential manner. To get fast and efficient classification for this special problem, a new online sequential extreme learning machine method with sequential SMOTE strategy is proposed. The key idea of this method is to reduce the randomness while generating virtual minority samples by means of the distribution characteristic of online sequential data. Utilizing online-sequential extreme learning machine as baseline algorithm, this method contains two stages. In offline stage, principal curve is introduced to model the each class's distribution based on which some virtual samples are generated by synthetic minority over-sampling technique(SMOTE). In online stage, each class's membership is determined according to the projection distance of sample to principal curve. With the help of these memberships, the redundant majority samples as well as unreasonable virtual minority samples are all excluded to lighten the imbalance level in online stage. The proposed method is evaluated on four UCI datasets and the real-world air pollutant forecasting dataset. The experimental results show that, the proposed method outperforms the classical ELM, OS-ELM and SMOTE-based OS-ELM in terms of generalization performance and numerical stability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信