Dual-metric guided multi-strategy hybrid optimization for feature selection on high-dimensional medical data

IF 8.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yan Kang , Dongsheng Zheng , Haining Wang , Yue Peng , Shixuan Zhou
{"title":"Dual-metric guided multi-strategy hybrid optimization for feature selection on high-dimensional medical data","authors":"Yan Kang ,&nbsp;Dongsheng Zheng ,&nbsp;Haining Wang ,&nbsp;Yue Peng ,&nbsp;Shixuan Zhou","doi":"10.1016/j.swevo.2025.102118","DOIUrl":null,"url":null,"abstract":"<div><div>The high-dimensional feature selection (FS) problem is challenging in medical fields due to the “curse of dimensionality” and the intricate relationships among various features. Although hybrid FS methods achieve high-performance solutions according to various mutual information metrics, such as symmetrical uncertainty (SU) and maximal information coefficient (MIC), they often overlook the differences between these metrics, and are still need to improve search strategies to escape from local optima. To address these challenges, we propose a dual-metric guided multi-strategy hybrid FS method (DGM) for high-dimensional medical datasets. The importance of features are first evaluated based on the SU and MIC metrics, and then the redundancy between features are reduced by fast clustering and grouping strategies. Furthermore, a two-level sampling strategy is proposed to guarantee the diversity and complementarity of population by considering the Jaccard Similarity and the correlation between features. A novel set-based multi-population PSO is designed to collaboratively search the optimal feature subset while obtaining feature importance during the evaluation process by a tri-archive assisted evolution approach. Specifically, two local archives help individuals escape from local optima, while the global archive optimizes the population. Finally, we develop various squeeze-expand mechanisms to dynamically adjust both the search space and the length of individuals to effectively balance exploration and exploitation. The experimental results on 13 medical datasets show that DGM significantly improves classification performance while selecting fewer features. The T-test results further indicate that DGM significantly outperforms all comparison methods in classification performance on 10 datasets, highlighting its strong competitiveness.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"98 ","pages":"Article 102118"},"PeriodicalIF":8.5000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225002767","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The high-dimensional feature selection (FS) problem is challenging in medical fields due to the “curse of dimensionality” and the intricate relationships among various features. Although hybrid FS methods achieve high-performance solutions according to various mutual information metrics, such as symmetrical uncertainty (SU) and maximal information coefficient (MIC), they often overlook the differences between these metrics, and are still need to improve search strategies to escape from local optima. To address these challenges, we propose a dual-metric guided multi-strategy hybrid FS method (DGM) for high-dimensional medical datasets. The importance of features are first evaluated based on the SU and MIC metrics, and then the redundancy between features are reduced by fast clustering and grouping strategies. Furthermore, a two-level sampling strategy is proposed to guarantee the diversity and complementarity of population by considering the Jaccard Similarity and the correlation between features. A novel set-based multi-population PSO is designed to collaboratively search the optimal feature subset while obtaining feature importance during the evaluation process by a tri-archive assisted evolution approach. Specifically, two local archives help individuals escape from local optima, while the global archive optimizes the population. Finally, we develop various squeeze-expand mechanisms to dynamically adjust both the search space and the length of individuals to effectively balance exploration and exploitation. The experimental results on 13 medical datasets show that DGM significantly improves classification performance while selecting fewer features. The T-test results further indicate that DGM significantly outperforms all comparison methods in classification performance on 10 datasets, highlighting its strong competitiveness.
基于双度量的高维医疗数据特征选择多策略混合优化
由于“维度诅咒”和各种特征之间错综复杂的关系,高维特征选择(FS)问题在医学领域具有挑战性。尽管混合FS方法根据对称不确定性(SU)和最大信息系数(MIC)等各种互信息度量来获得高性能解,但它们往往忽略了这些度量之间的差异,并且仍然需要改进搜索策略以避免局部最优。为了解决这些挑战,我们提出了一种针对高维医疗数据集的双度量引导多策略混合FS方法(DGM)。首先基于SU和MIC指标评估特征的重要性,然后采用快速聚类和分组策略减少特征之间的冗余。在此基础上,结合Jaccard相似性和特征间的相关性,提出了一种保证种群多样性和互补性的两级采样策略。设计了一种新的基于集合的多种群粒子群算法,在协同搜索最优特征子集的同时,通过三档案辅助进化方法在评估过程中获取特征重要性。具体来说,两个局部档案帮助个体逃离局部最优,而全局档案使人口优化。最后,我们开发了各种挤压扩展机制来动态调整搜索空间和个体长度,以有效地平衡探索和利用。在13个医学数据集上的实验结果表明,DGM在选择较少特征的情况下显著提高了分类性能。t检验结果进一步表明,DGM在10个数据集上的分类性能显著优于所有比较方法,显示出较强的竞争力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Swarm and Evolutionary Computation
Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS
CiteScore
16.00
自引率
12.00%
发文量
169
期刊介绍: Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信