欠采样和细化合成少数派集的框架

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Payel Sadhukhan
{"title":"欠采样和细化合成少数派集的框架","authors":"Payel Sadhukhan","doi":"10.1016/j.asoc.2025.113095","DOIUrl":null,"url":null,"abstract":"<div><div>Oversampling the minority class is a popular strategy for coping with the imbalance of datasets. It improves the cognition of the minority points to an admissible extent. Nonetheless, the synthetic minority instances accentuate the overlap between the majority class and the augmented minority class. It is detrimental to the rightful cognition of both classes. To this end, this paper introduces a novel strategy to undersample the synthetic minority set. A multi-armed bandit (MAB) guided protocol is followed to [i] identify the synthetic minority instances that contribute to the increased overlap between the two classes and [ii] subsequently remove (undersample) them iteratively to obtain a refined synthetic minority set. Simulation on synthetic datasets shows that the proposed strategy is successful in increasing the Gromov–Wasserstein distance between the original majority class distribution and the synthetic minority points’ distribution (as compared to the regular oversampled data obtained through state-of-the-art techniques). Empirical evaluation in sixteen real-world datasets, four state-of-the-art minority oversamplers, and two refinement techniques manifest the competence of the proposed strategy over baseline results and against the two competing methods. The proposed strategy has improved the performance of the majority class without bringing down the minority class’s performance and can be incorporated in sensitive real-world domains.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"175 ","pages":"Article 113095"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A framework to undersample and refine the synthetic minority set\",\"authors\":\"Payel Sadhukhan\",\"doi\":\"10.1016/j.asoc.2025.113095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Oversampling the minority class is a popular strategy for coping with the imbalance of datasets. It improves the cognition of the minority points to an admissible extent. Nonetheless, the synthetic minority instances accentuate the overlap between the majority class and the augmented minority class. It is detrimental to the rightful cognition of both classes. To this end, this paper introduces a novel strategy to undersample the synthetic minority set. A multi-armed bandit (MAB) guided protocol is followed to [i] identify the synthetic minority instances that contribute to the increased overlap between the two classes and [ii] subsequently remove (undersample) them iteratively to obtain a refined synthetic minority set. Simulation on synthetic datasets shows that the proposed strategy is successful in increasing the Gromov–Wasserstein distance between the original majority class distribution and the synthetic minority points’ distribution (as compared to the regular oversampled data obtained through state-of-the-art techniques). Empirical evaluation in sixteen real-world datasets, four state-of-the-art minority oversamplers, and two refinement techniques manifest the competence of the proposed strategy over baseline results and against the two competing methods. The proposed strategy has improved the performance of the majority class without bringing down the minority class’s performance and can be incorporated in sensitive real-world domains.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"175 \",\"pages\":\"Article 113095\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625004065\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625004065","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

对少数类进行过采样是处理数据集不平衡的一种流行策略。它在一定程度上提高了对少数民族的认识。尽管如此,合成的少数群体实例强调了多数群体和增强的少数群体之间的重叠。这不利于两个阶级的正确认识。为此,本文引入了一种对合成少数派集进行欠采样的新策略。遵循多臂盗匪(MAB)指导协议,以[i]识别导致两个类别之间重叠增加的合成少数派实例,[ii]随后迭代删除(欠采样)它们以获得精炼的合成少数派集。在合成数据集上的仿真表明,所提出的策略成功地增加了原始多数类分布与合成少数点分布之间的Gromov-Wasserstein距离(与通过最先进的技术获得的规则过采样数据相比)。对16个真实世界数据集、4个最先进的少数过采样器和两种改进技术的实证评估表明,所提出的策略优于基线结果,并优于两种竞争方法。所提出的策略在不降低少数类性能的情况下提高了多数类的性能,并且可以将其纳入敏感的现实世界领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A framework to undersample and refine the synthetic minority set
Oversampling the minority class is a popular strategy for coping with the imbalance of datasets. It improves the cognition of the minority points to an admissible extent. Nonetheless, the synthetic minority instances accentuate the overlap between the majority class and the augmented minority class. It is detrimental to the rightful cognition of both classes. To this end, this paper introduces a novel strategy to undersample the synthetic minority set. A multi-armed bandit (MAB) guided protocol is followed to [i] identify the synthetic minority instances that contribute to the increased overlap between the two classes and [ii] subsequently remove (undersample) them iteratively to obtain a refined synthetic minority set. Simulation on synthetic datasets shows that the proposed strategy is successful in increasing the Gromov–Wasserstein distance between the original majority class distribution and the synthetic minority points’ distribution (as compared to the regular oversampled data obtained through state-of-the-art techniques). Empirical evaluation in sixteen real-world datasets, four state-of-the-art minority oversamplers, and two refinement techniques manifest the competence of the proposed strategy over baseline results and against the two competing methods. The proposed strategy has improved the performance of the majority class without bringing down the minority class’s performance and can be incorporated in sensitive real-world domains.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Soft Computing
Applied Soft Computing 工程技术-计算机:跨学科应用
CiteScore
15.80
自引率
6.90%
发文量
874
审稿时长
10.9 months
期刊介绍: Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信