{"title":"基于聚类的超采样与代表性点面积提取,用于类不平衡学习","authors":"Zakarya Farou , Yizhi Wang , Tomáš Horváth","doi":"10.1016/j.iswa.2024.200357","DOIUrl":null,"url":null,"abstract":"<div><p>Class imbalance learning is challenging in various domains where training datasets exhibit disproportionate samples in a specific class. Resampling methods have been used to adjust the class distribution, but they often have limitations for small disjunct minority subsets. This paper introduces AROSS, an adaptive cluster-based oversampling approach that addresses these limitations. AROSS utilizes an optimized agglomerative clustering algorithm with the Cophenetic Correlation Coefficient and the Bayesian Information Criterion to identify representative areas of the minority class. Safe and half-safe areas are obtained using an incremental k-Nearest Neighbor strategy, and oversampling is performed with a truncated hyperspherical Gaussian distribution. Experimental evaluations on 70 binary datasets demonstrate the effectiveness of AROSS in improving class imbalance learning performance, making it a promising solution for mitigating class imbalance challenges, especially for small disjunct minority subsets.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"22 ","pages":"Article 200357"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324000334/pdfft?md5=a11f2bb04866bb8768451b4018887e0e&pid=1-s2.0-S2667305324000334-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Cluster-based oversampling with area extraction from representative points for class imbalance learning\",\"authors\":\"Zakarya Farou , Yizhi Wang , Tomáš Horváth\",\"doi\":\"10.1016/j.iswa.2024.200357\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Class imbalance learning is challenging in various domains where training datasets exhibit disproportionate samples in a specific class. Resampling methods have been used to adjust the class distribution, but they often have limitations for small disjunct minority subsets. This paper introduces AROSS, an adaptive cluster-based oversampling approach that addresses these limitations. AROSS utilizes an optimized agglomerative clustering algorithm with the Cophenetic Correlation Coefficient and the Bayesian Information Criterion to identify representative areas of the minority class. Safe and half-safe areas are obtained using an incremental k-Nearest Neighbor strategy, and oversampling is performed with a truncated hyperspherical Gaussian distribution. Experimental evaluations on 70 binary datasets demonstrate the effectiveness of AROSS in improving class imbalance learning performance, making it a promising solution for mitigating class imbalance challenges, especially for small disjunct minority subsets.</p></div>\",\"PeriodicalId\":100684,\"journal\":{\"name\":\"Intelligent Systems with Applications\",\"volume\":\"22 \",\"pages\":\"Article 200357\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2667305324000334/pdfft?md5=a11f2bb04866bb8768451b4018887e0e&pid=1-s2.0-S2667305324000334-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Systems with Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667305324000334\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305324000334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在各种领域中,类不平衡学习都具有挑战性,因为在这些领域中,训练数据集显示出特定类中样本比例失调。重采样方法已被用于调整类分布,但对于小的不连续性少数群体子集,这些方法往往有局限性。本文介绍的 AROSS 是一种基于聚类的自适应超采样方法,可以解决这些局限性。AROSS 利用优化的聚集聚类算法、科芬尼相关系数和贝叶斯信息标准来确定少数群体的代表性区域。使用增量 k 近邻策略获得安全区和半安全区,并使用截断的超球面高斯分布进行超采样。在 70 个二元数据集上进行的实验评估表明,AROSS 在提高类不平衡学习性能方面非常有效,使其成为缓解类不平衡挑战的一种有前途的解决方案,特别是对于小的不连续性少数群体子集。
Cluster-based oversampling with area extraction from representative points for class imbalance learning
Class imbalance learning is challenging in various domains where training datasets exhibit disproportionate samples in a specific class. Resampling methods have been used to adjust the class distribution, but they often have limitations for small disjunct minority subsets. This paper introduces AROSS, an adaptive cluster-based oversampling approach that addresses these limitations. AROSS utilizes an optimized agglomerative clustering algorithm with the Cophenetic Correlation Coefficient and the Bayesian Information Criterion to identify representative areas of the minority class. Safe and half-safe areas are obtained using an incremental k-Nearest Neighbor strategy, and oversampling is performed with a truncated hyperspherical Gaussian distribution. Experimental evaluations on 70 binary datasets demonstrate the effectiveness of AROSS in improving class imbalance learning performance, making it a promising solution for mitigating class imbalance challenges, especially for small disjunct minority subsets.