多类分类中有效过采样和欠采样的混合聚类策略。

IF 3.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Amirreza Salehi, Majid Khedmati
{"title":"多类分类中有效过采样和欠采样的混合聚类策略。","authors":"Amirreza Salehi, Majid Khedmati","doi":"10.1038/s41598-024-84786-2","DOIUrl":null,"url":null,"abstract":"<p><p>Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"3460"},"PeriodicalIF":3.9000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772689/pdf/","citationCount":"0","resultStr":"{\"title\":\"Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.\",\"authors\":\"Amirreza Salehi, Majid Khedmati\",\"doi\":\"10.1038/s41598-024-84786-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.</p>\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"3460\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772689/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-024-84786-2\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-024-84786-2","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

在现实世界的数据集中,多类不平衡是一个具有挑战性的问题,其中某些类的样本数量可能很低,因为它们对应于罕见的情况。为了解决多类不平衡问题,本文提出了一种新的基于聚类的过采样和欠采样混合技术。该算法通过聚类并将类划分为多数类和少数类,在欠采样过程中保留最多的信息,同时在少数类中生成高效的数据。分类采用一对一和一对全分解方案进行。在30个数据集上进行了大量的实验来评估该算法的性能。结果随后与几个最先进的算法进行了比较。结果表明,该算法在不同场景下均优于竞争算法。最后,HCBOU算法在不同的类不平衡水平上表现出鲁棒性,突出了其在处理不平衡数据集方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.

Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.

Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.

Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification.

Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Scientific Reports
Scientific Reports Natural Science Disciplines-
CiteScore
7.50
自引率
4.30%
发文量
19567
审稿时长
3.9 months
期刊介绍: We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections. Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021). •Engineering Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live. •Physical sciences Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics. •Earth and environmental sciences Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems. •Biological sciences Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants. •Health sciences The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信