HiBBKA: A Hybrid Method With Resampling and Heuristic Feature Selection for Class-Imbalanced Data in Chemometrics

IF 2.3 4区 化学 Q1 SOCIAL WORK
Ying Guo, Ying Kou, Lun-Zhao Yi, Guang-Hui Fu
{"title":"HiBBKA: A Hybrid Method With Resampling and Heuristic Feature Selection for Class-Imbalanced Data in Chemometrics","authors":"Ying Guo,&nbsp;Ying Kou,&nbsp;Lun-Zhao Yi,&nbsp;Guang-Hui Fu","doi":"10.1002/cem.70029","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In critical domains including medicinal chemistry, biomedicine, metabolomics, and computational toxicology, class imbalance in datasets and poor recognition accuracy for minority classes remain persistent challenges. While previous studies have employed resampling and feature selection techniques to address data imbalance and enhance classification performance, most approaches have focused on single-algorithm solutions rather than hybrid methodologies. Hybrid algorithms offer distinct advantages by integrating the strengths of multiple techniques, thereby providing more comprehensive and efficient solutions for handling imbalanced data. This study proposes HiBBKA, a novel hybrid algorithm combining radial-based under-sampling with SMOTE (RBU-SMOTE) and an improved binary black-winged kite algorithm (iBBKA) for feature selection. The proposed framework operates through two key phases: First, the RBU-SMOTE resampling method synergistically integrates radial-based under-sampling (RBU) with the synthetic minority oversampling technique (SMOTE), effectively addressing class-imbalance distribution while enhancing the quality of synthesized samples. Second, the enhanced iBBKA feature selection algorithm systematically identifies the most discriminative features critical for classification tasks. We comprehensively evaluate RBU-SMOTE and HiBBKA using multiple classifiers across 16 imbalanced datasets, including real-world medical datasets, with particular emphasis on the minority class performance. Experimental results demonstrate that RBU-SMOTE achieves competitive performance compared to existing resampling methods, while the complete HiBBKA framework significantly outperforms state-of-the-art algorithms in overall classification metrics, particularly in the minority class recognition.</p>\n </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 5","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.70029","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

Abstract

In critical domains including medicinal chemistry, biomedicine, metabolomics, and computational toxicology, class imbalance in datasets and poor recognition accuracy for minority classes remain persistent challenges. While previous studies have employed resampling and feature selection techniques to address data imbalance and enhance classification performance, most approaches have focused on single-algorithm solutions rather than hybrid methodologies. Hybrid algorithms offer distinct advantages by integrating the strengths of multiple techniques, thereby providing more comprehensive and efficient solutions for handling imbalanced data. This study proposes HiBBKA, a novel hybrid algorithm combining radial-based under-sampling with SMOTE (RBU-SMOTE) and an improved binary black-winged kite algorithm (iBBKA) for feature selection. The proposed framework operates through two key phases: First, the RBU-SMOTE resampling method synergistically integrates radial-based under-sampling (RBU) with the synthetic minority oversampling technique (SMOTE), effectively addressing class-imbalance distribution while enhancing the quality of synthesized samples. Second, the enhanced iBBKA feature selection algorithm systematically identifies the most discriminative features critical for classification tasks. We comprehensively evaluate RBU-SMOTE and HiBBKA using multiple classifiers across 16 imbalanced datasets, including real-world medical datasets, with particular emphasis on the minority class performance. Experimental results demonstrate that RBU-SMOTE achieves competitive performance compared to existing resampling methods, while the complete HiBBKA framework significantly outperforms state-of-the-art algorithms in overall classification metrics, particularly in the minority class recognition.

化学计量学中类不平衡数据的重采样和启发式特征选择混合方法
在包括药物化学、生物医学、代谢组学和计算毒理学在内的关键领域,数据集的类别不平衡和对少数类别的识别准确性差仍然是持续存在的挑战。虽然以前的研究使用重采样和特征选择技术来解决数据不平衡和提高分类性能,但大多数方法都集中在单一算法解决方案上,而不是混合方法。混合算法通过综合多种技术的优势,为处理不平衡数据提供更全面、更高效的解决方案,具有明显的优势。本研究提出了一种将径向欠采样与SMOTE算法(RBU-SMOTE)和改进的二进制黑翼风筝算法(iBBKA)相结合的特征选择混合算法HiBBKA。该框架通过两个关键阶段进行:首先,RBU-SMOTE重采样方法将基于径向的欠采样(RBU)与合成少数派过采样技术(SMOTE)协同集成,有效地解决了类不平衡分布问题,同时提高了合成样本的质量。其次,改进的iBBKA特征选择算法系统地识别出对分类任务最具判别性的特征。我们使用多个分类器在16个不平衡数据集(包括现实世界的医疗数据集)中全面评估RBU-SMOTE和HiBBKA,特别强调少数类别的表现。实验结果表明,与现有的重采样方法相比,RBU-SMOTE取得了具有竞争力的性能,而完整的HiBBKA框架在总体分类指标上明显优于最先进的算法,特别是在少数类识别方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信