A quantum-based oversampling method for classification of highly imbalanced and overlapped data.

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
ACS Applied Bio Materials Pub Date : 2023-12-01 Epub Date: 2024-01-28 DOI:10.1177/15353702231220665
Bei Yang, Guilan Tian, Joseph Luttrell, Ping Gong, Chaoyang Zhang
{"title":"A quantum-based oversampling method for classification of highly imbalanced and overlapped data.","authors":"Bei Yang, Guilan Tian, Joseph Luttrell, Ping Gong, Chaoyang Zhang","doi":"10.1177/15353702231220665","DOIUrl":null,"url":null,"abstract":"<p><p>Data imbalance is a challenging problem in classification tasks, and when combined with class overlapping, it further deteriorates classification performance. However, existing studies have rarely addressed both issues simultaneously. In this article, we propose a novel quantum-based oversampling method (QOSM) to effectively tackle data imbalance and class overlapping, thereby improving classification performance. QOSM utilizes the quantum potential theory to calculate the potential energy of each sample and selects the sample with the lowest potential as the center of each cover generated by a constructive covering algorithm. This approach optimizes cover center selection and better captures the distribution of the original samples, particularly in the overlapping regions. In addition, oversampling is performed on the samples of the minority class covers to mitigate the imbalance ratio (IR). We evaluated QOSM using three traditional classifiers (support vector machines [SVM], k-nearest neighbor [KNN], and naive Bayes [NB] classifier) on 10 publicly available KEEL data sets characterized by high IRs and varying degrees of overlap. Experimental results demonstrate that QOSM significantly improves classification accuracy compared to approaches that do not address class imbalance and overlapping. Moreover, QOSM consistently outperforms existing oversampling methods tested. With its compatibility with different classifiers, QOSM exhibits promising potential to improve the classification performance of highly imbalanced and overlapped data.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10854475/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/15353702231220665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

Data imbalance is a challenging problem in classification tasks, and when combined with class overlapping, it further deteriorates classification performance. However, existing studies have rarely addressed both issues simultaneously. In this article, we propose a novel quantum-based oversampling method (QOSM) to effectively tackle data imbalance and class overlapping, thereby improving classification performance. QOSM utilizes the quantum potential theory to calculate the potential energy of each sample and selects the sample with the lowest potential as the center of each cover generated by a constructive covering algorithm. This approach optimizes cover center selection and better captures the distribution of the original samples, particularly in the overlapping regions. In addition, oversampling is performed on the samples of the minority class covers to mitigate the imbalance ratio (IR). We evaluated QOSM using three traditional classifiers (support vector machines [SVM], k-nearest neighbor [KNN], and naive Bayes [NB] classifier) on 10 publicly available KEEL data sets characterized by high IRs and varying degrees of overlap. Experimental results demonstrate that QOSM significantly improves classification accuracy compared to approaches that do not address class imbalance and overlapping. Moreover, QOSM consistently outperforms existing oversampling methods tested. With its compatibility with different classifiers, QOSM exhibits promising potential to improve the classification performance of highly imbalanced and overlapped data.

一种基于量子的超采样方法,用于高度不平衡和重叠数据的分类。
在分类任务中,数据不平衡是一个具有挑战性的问题,如果再加上类别重叠,则会进一步降低分类性能。然而,现有研究很少同时解决这两个问题。在本文中,我们提出了一种新颖的基于量子的超采样方法(QOSM),以有效解决数据不平衡和类重叠问题,从而提高分类性能。QOSM 利用量子势理论计算每个样本的势能,并选择势能最小的样本作为构造覆盖算法生成的每个覆盖的中心。这种方法优化了覆盖中心的选择,能更好地捕捉原始样本的分布,尤其是在重叠区域。此外,还对少数类覆盖的样本进行了超采样,以减轻不平衡率(IR)。我们使用三种传统分类器(支持向量机 [SVM]、k-近邻 [KNN] 和天真贝叶斯 [NB] 分类器)在 10 个公开的 KEEL 数据集上对 QOSM 进行了评估,这些数据集的特点是高 IR 和不同程度的重叠。实验结果表明,与没有解决类不平衡和重叠问题的方法相比,QOSM 能显著提高分类准确率。此外,QOSM 始终优于测试过的现有超采样方法。QOSM 与不同的分类器兼容,因此在提高高度不平衡和重叠数据的分类性能方面具有广阔的前景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信