A Positive Sample Enhancement Algorithm with Fuzzy Nearest Neighbor Hybridization for Imbalance Data

IF 3.6 3区计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS

International Journal of Fuzzy Systems Pub Date : 2024-06-03 DOI:10.1007/s40815-024-01721-3

Jiapeng Yang, Lei Shi, Tielin Lu, Lu Yuan, Nanchang Cheng, Xiaohui Yang, Jia Luo, Mingying Xu

{"title":"A Positive Sample Enhancement Algorithm with Fuzzy Nearest Neighbor Hybridization for Imbalance Data","authors":"Jiapeng Yang, Lei Shi, Tielin Lu, Lu Yuan, Nanchang Cheng, Xiaohui Yang, Jia Luo, Mingying Xu","doi":"10.1007/s40815-024-01721-3","DOIUrl":null,"url":null,"abstract":"<p>The class imbalance problem is one of the critical research areas of machine learning and deep learning and has received widespread attention from researchers. To solve the class imbalance problem, current typical methods only use positive samples to generate synthetic samples that are similar to the minority class while ignoring the characteristic information of negative samples. Therefore, when the number of positive samples is too small and has highly similar features, it will cause the classifier to have fitting problems. In response to the above problems, we propose a new positive sample enhancement algorithm (PENH) to solve the class imbalance by simulating the process of chromosome cross-fusion. We select the fuzzy negative sample set around the positive sample by the <i>K</i>-nearest neighbor algorithm and adopt the beyond empirical risk minimization (Mixup) to randomly hybridize the positive sample with the negative sample of the set. To overcome the problem of sample imbalance, we adopt the One-class SVM with overfitting of positive samples to select the newly generated unlabeled samples to obtain the balanced dataset. We construct multiple experiments in 20 open datasets. The results show that our PENH outperforms the other six baseline methods in multiple evaluation indicator.</p>","PeriodicalId":14056,"journal":{"name":"International Journal of Fuzzy Systems","volume":"41 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Fuzzy Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40815-024-01721-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The class imbalance problem is one of the critical research areas of machine learning and deep learning and has received widespread attention from researchers. To solve the class imbalance problem, current typical methods only use positive samples to generate synthetic samples that are similar to the minority class while ignoring the characteristic information of negative samples. Therefore, when the number of positive samples is too small and has highly similar features, it will cause the classifier to have fitting problems. In response to the above problems, we propose a new positive sample enhancement algorithm (PENH) to solve the class imbalance by simulating the process of chromosome cross-fusion. We select the fuzzy negative sample set around the positive sample by the K-nearest neighbor algorithm and adopt the beyond empirical risk minimization (Mixup) to randomly hybridize the positive sample with the negative sample of the set. To overcome the problem of sample imbalance, we adopt the One-class SVM with overfitting of positive samples to select the newly generated unlabeled samples to obtain the balanced dataset. We construct multiple experiments in 20 open datasets. The results show that our PENH outperforms the other six baseline methods in multiple evaluation indicator.

Abstract Image

查看原文本刊更多论文

针对不平衡数据的模糊近邻混合正样本增强算法

类不平衡问题是机器学习和深度学习的重要研究领域之一，受到了研究人员的广泛关注。为了解决类不平衡问题，目前的典型方法只使用正样本生成与少数类相似的合成样本，而忽略了负样本的特征信息。因此，当正向样本数量太少且特征高度相似时，会导致分类器出现拟合问题。针对上述问题，我们提出了一种新的正样本增强算法（PENH），通过模拟染色体交叉融合过程来解决类不平衡问题。我们通过 K-nearest neighbor 算法选择正样本周围的模糊负样本集，并采用超越经验风险最小化（Mixup）算法随机混合正样本和负样本集。为了克服样本不平衡的问题，我们采用对正样本进行过拟合的单类 SVM 来选择新生成的未标记样本，从而获得平衡的数据集。我们在 20 个开放数据集上进行了多次实验。结果表明，我们的 PENH 在多个评价指标上都优于其他六种基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Fuzzy Systems 工程技术-计算机：人工智能

CiteScore

7.80

自引率

9.30%

发文量

188

审稿时长

16 months

期刊介绍： The International Journal of Fuzzy Systems (IJFS) is an official journal of Taiwan Fuzzy Systems Association (TFSA) and is published semi-quarterly. IJFS will consider high quality papers that deal with the theory, design, and application of fuzzy systems, soft computing systems, grey systems, and extension theory systems ranging from hardware to software. Survey and expository submissions are also welcome.