{"title":"欠采样和细化合成少数派集的框架","authors":"Payel Sadhukhan","doi":"10.1016/j.asoc.2025.113095","DOIUrl":null,"url":null,"abstract":"<div><div>Oversampling the minority class is a popular strategy for coping with the imbalance of datasets. It improves the cognition of the minority points to an admissible extent. Nonetheless, the synthetic minority instances accentuate the overlap between the majority class and the augmented minority class. It is detrimental to the rightful cognition of both classes. To this end, this paper introduces a novel strategy to undersample the synthetic minority set. A multi-armed bandit (MAB) guided protocol is followed to [i] identify the synthetic minority instances that contribute to the increased overlap between the two classes and [ii] subsequently remove (undersample) them iteratively to obtain a refined synthetic minority set. Simulation on synthetic datasets shows that the proposed strategy is successful in increasing the Gromov–Wasserstein distance between the original majority class distribution and the synthetic minority points’ distribution (as compared to the regular oversampled data obtained through state-of-the-art techniques). Empirical evaluation in sixteen real-world datasets, four state-of-the-art minority oversamplers, and two refinement techniques manifest the competence of the proposed strategy over baseline results and against the two competing methods. The proposed strategy has improved the performance of the majority class without bringing down the minority class’s performance and can be incorporated in sensitive real-world domains.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"175 ","pages":"Article 113095"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A framework to undersample and refine the synthetic minority set\",\"authors\":\"Payel Sadhukhan\",\"doi\":\"10.1016/j.asoc.2025.113095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Oversampling the minority class is a popular strategy for coping with the imbalance of datasets. It improves the cognition of the minority points to an admissible extent. Nonetheless, the synthetic minority instances accentuate the overlap between the majority class and the augmented minority class. It is detrimental to the rightful cognition of both classes. To this end, this paper introduces a novel strategy to undersample the synthetic minority set. A multi-armed bandit (MAB) guided protocol is followed to [i] identify the synthetic minority instances that contribute to the increased overlap between the two classes and [ii] subsequently remove (undersample) them iteratively to obtain a refined synthetic minority set. Simulation on synthetic datasets shows that the proposed strategy is successful in increasing the Gromov–Wasserstein distance between the original majority class distribution and the synthetic minority points’ distribution (as compared to the regular oversampled data obtained through state-of-the-art techniques). Empirical evaluation in sixteen real-world datasets, four state-of-the-art minority oversamplers, and two refinement techniques manifest the competence of the proposed strategy over baseline results and against the two competing methods. The proposed strategy has improved the performance of the majority class without bringing down the minority class’s performance and can be incorporated in sensitive real-world domains.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"175 \",\"pages\":\"Article 113095\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494625004065\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625004065","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A framework to undersample and refine the synthetic minority set
Oversampling the minority class is a popular strategy for coping with the imbalance of datasets. It improves the cognition of the minority points to an admissible extent. Nonetheless, the synthetic minority instances accentuate the overlap between the majority class and the augmented minority class. It is detrimental to the rightful cognition of both classes. To this end, this paper introduces a novel strategy to undersample the synthetic minority set. A multi-armed bandit (MAB) guided protocol is followed to [i] identify the synthetic minority instances that contribute to the increased overlap between the two classes and [ii] subsequently remove (undersample) them iteratively to obtain a refined synthetic minority set. Simulation on synthetic datasets shows that the proposed strategy is successful in increasing the Gromov–Wasserstein distance between the original majority class distribution and the synthetic minority points’ distribution (as compared to the regular oversampled data obtained through state-of-the-art techniques). Empirical evaluation in sixteen real-world datasets, four state-of-the-art minority oversamplers, and two refinement techniques manifest the competence of the proposed strategy over baseline results and against the two competing methods. The proposed strategy has improved the performance of the majority class without bringing down the minority class’s performance and can be incorporated in sensitive real-world domains.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.