A Positive Sample Enhancement Algorithm with Fuzzy Nearest Neighbor Hybridization for Imbalance Data

IF 3.6 3区 计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS
Jiapeng Yang, Lei Shi, Tielin Lu, Lu Yuan, Nanchang Cheng, Xiaohui Yang, Jia Luo, Mingying Xu
{"title":"A Positive Sample Enhancement Algorithm with Fuzzy Nearest Neighbor Hybridization for Imbalance Data","authors":"Jiapeng Yang, Lei Shi, Tielin Lu, Lu Yuan, Nanchang Cheng, Xiaohui Yang, Jia Luo, Mingying Xu","doi":"10.1007/s40815-024-01721-3","DOIUrl":null,"url":null,"abstract":"<p>The class imbalance problem is one of the critical research areas of machine learning and deep learning and has received widespread attention from researchers. To solve the class imbalance problem, current typical methods only use positive samples to generate synthetic samples that are similar to the minority class while ignoring the characteristic information of negative samples. Therefore, when the number of positive samples is too small and has highly similar features, it will cause the classifier to have fitting problems. In response to the above problems, we propose a new positive sample enhancement algorithm (PENH) to solve the class imbalance by simulating the process of chromosome cross-fusion. We select the fuzzy negative sample set around the positive sample by the <i>K</i>-nearest neighbor algorithm and adopt the beyond empirical risk minimization (Mixup) to randomly hybridize the positive sample with the negative sample of the set. To overcome the problem of sample imbalance, we adopt the One-class SVM with overfitting of positive samples to select the newly generated unlabeled samples to obtain the balanced dataset. We construct multiple experiments in 20 open datasets. The results show that our PENH outperforms the other six baseline methods in multiple evaluation indicator.</p>","PeriodicalId":14056,"journal":{"name":"International Journal of Fuzzy Systems","volume":"41 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Fuzzy Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40815-024-01721-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The class imbalance problem is one of the critical research areas of machine learning and deep learning and has received widespread attention from researchers. To solve the class imbalance problem, current typical methods only use positive samples to generate synthetic samples that are similar to the minority class while ignoring the characteristic information of negative samples. Therefore, when the number of positive samples is too small and has highly similar features, it will cause the classifier to have fitting problems. In response to the above problems, we propose a new positive sample enhancement algorithm (PENH) to solve the class imbalance by simulating the process of chromosome cross-fusion. We select the fuzzy negative sample set around the positive sample by the K-nearest neighbor algorithm and adopt the beyond empirical risk minimization (Mixup) to randomly hybridize the positive sample with the negative sample of the set. To overcome the problem of sample imbalance, we adopt the One-class SVM with overfitting of positive samples to select the newly generated unlabeled samples to obtain the balanced dataset. We construct multiple experiments in 20 open datasets. The results show that our PENH outperforms the other six baseline methods in multiple evaluation indicator.

Abstract Image

针对不平衡数据的模糊近邻混合正样本增强算法
类不平衡问题是机器学习和深度学习的重要研究领域之一,受到了研究人员的广泛关注。为了解决类不平衡问题,目前的典型方法只使用正样本生成与少数类相似的合成样本,而忽略了负样本的特征信息。因此,当正向样本数量太少且特征高度相似时,会导致分类器出现拟合问题。针对上述问题,我们提出了一种新的正样本增强算法(PENH),通过模拟染色体交叉融合过程来解决类不平衡问题。我们通过 K-nearest neighbor 算法选择正样本周围的模糊负样本集,并采用超越经验风险最小化(Mixup)算法随机混合正样本和负样本集。为了克服样本不平衡的问题,我们采用对正样本进行过拟合的单类 SVM 来选择新生成的未标记样本,从而获得平衡的数据集。我们在 20 个开放数据集上进行了多次实验。结果表明,我们的 PENH 在多个评价指标上都优于其他六种基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Fuzzy Systems
International Journal of Fuzzy Systems 工程技术-计算机:人工智能
CiteScore
7.80
自引率
9.30%
发文量
188
审稿时长
16 months
期刊介绍: The International Journal of Fuzzy Systems (IJFS) is an official journal of Taiwan Fuzzy Systems Association (TFSA) and is published semi-quarterly. IJFS will consider high quality papers that deal with the theory, design, and application of fuzzy systems, soft computing systems, grey systems, and extension theory systems ranging from hardware to software. Survey and expository submissions are also welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信