不平衡问题的欠采样近决策边界

Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent
{"title":"不平衡问题的欠采样近决策边界","authors":"Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent","doi":"10.1109/ICMLC48188.2019.8949290","DOIUrl":null,"url":null,"abstract":"Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Undersampling Near Decision Boundary for Imbalance Problems\",\"authors\":\"Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent\",\"doi\":\"10.1109/ICMLC48188.2019.8949290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.\",\"PeriodicalId\":221349,\"journal\":{\"name\":\"2019 International Conference on Machine Learning and Cybernetics (ICMLC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Machine Learning and Cybernetics (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC48188.2019.8949290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC48188.2019.8949290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

对数据集进行欠采样以重新平衡类分布是处理类不平衡问题的有效方法。然而,通过均匀分布随机去除大多数样本可能会导致不必要的信息损失。这将导致使用此重新平衡数据集训练的分类器的性能下降。另一方面,实例对于类不平衡有不同的敏感性。更高的灵敏度意味着这个例子更容易受到类不平衡的影响,可以用它来指导例子的选择,重新平衡类分布,提高分类器的性能。因此,在本文中,我们提出了一种新的欠采样方法,即基于每个多数样本的灵敏度的使用灵敏度的欠采样(USS)。低灵敏度的例子是有噪声的或安全的例子,而高灵敏度的例子是边缘例子。在USS中,大多数具有较高灵敏度的样本更有可能被选中。在20个数据集上的实验证实了该方法相对于一种基线方法和五种重采样方法的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Undersampling Near Decision Boundary for Imbalance Problems
Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信