Active learning for imbalance problem using L-GEM of RBFNN

Junjie Hu
{"title":"Active learning for imbalance problem using L-GEM of RBFNN","authors":"Junjie Hu","doi":"10.1109/ICMLC.2012.6358972","DOIUrl":null,"url":null,"abstract":"In lots of important applications, such as malignant cell detection, network intrusion detection, error signal detection in power system, the data distributions of positive and negative classes are usually imbalance. Many classifiers could not perform well in data imbalance cases. The major problem is that classifiers tend to ignore samples and accuracy of the minority class without regarding the higher cost of misclassification in this minor class. Therefore, pattern classification for imbalance data becomes a hot challenge to both academy and industry. In this paper, we propose an active learning method for imbalance data using a stochastic sensitivity measure (ST-SM) of Radial Basis Function Neural Network (RBFNN). A large ST-SM indicates the RBFNN is uncertain and yields a large output fluctuation around a particular sample. These samples yielding large ST-SM values are selected for adding to the training set in each turn. Empirically, samples with large output perturbation (i.e. large ST-SM) should be located near the classification boundary and is of great significance for the training of classifier. As for the imbalance characteristic of the data set, the ST-SM should be able to reduce the number of redundant samples being selected in the majority class, rebalance the sample distribution of the training set, and finally improve the performance of the classifier.","PeriodicalId":128006,"journal":{"name":"2012 International Conference on Machine Learning and Cybernetics","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2012.6358972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

In lots of important applications, such as malignant cell detection, network intrusion detection, error signal detection in power system, the data distributions of positive and negative classes are usually imbalance. Many classifiers could not perform well in data imbalance cases. The major problem is that classifiers tend to ignore samples and accuracy of the minority class without regarding the higher cost of misclassification in this minor class. Therefore, pattern classification for imbalance data becomes a hot challenge to both academy and industry. In this paper, we propose an active learning method for imbalance data using a stochastic sensitivity measure (ST-SM) of Radial Basis Function Neural Network (RBFNN). A large ST-SM indicates the RBFNN is uncertain and yields a large output fluctuation around a particular sample. These samples yielding large ST-SM values are selected for adding to the training set in each turn. Empirically, samples with large output perturbation (i.e. large ST-SM) should be located near the classification boundary and is of great significance for the training of classifier. As for the imbalance characteristic of the data set, the ST-SM should be able to reduce the number of redundant samples being selected in the majority class, rebalance the sample distribution of the training set, and finally improve the performance of the classifier.
基于RBFNN的L-GEM的失衡问题主动学习
在恶性细胞检测、网络入侵检测、电力系统错误信号检测等重要应用中,正、负类数据的分布往往不平衡。许多分类器在数据不平衡的情况下表现不佳。主要的问题是,分类器倾向于忽略少数类的样本和准确性,而不考虑在这个少数类中错误分类的更高成本。因此,失衡数据的模式分类成为学术界和业界共同关注的热点问题。本文提出了一种基于径向基函数神经网络(RBFNN)的随机灵敏度测量(ST-SM)的不平衡数据主动学习方法。较大的ST-SM表明RBFNN是不确定的,并且在特定样本周围产生较大的输出波动。这些产生较大ST-SM值的样本被选择添加到每一轮的训练集中。经验上,输出扰动大的样本(即ST-SM大)应该位于分类边界附近,这对分类器的训练有重要意义。对于数据集的不平衡特性,ST-SM应该能够减少多数类中被选择的冗余样本数量,重新平衡训练集的样本分布,最终提高分类器的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信