Yan Kang , Dongsheng Zheng , Haining Wang , Yue Peng , Shixuan Zhou
{"title":"Dual-metric guided multi-strategy hybrid optimization for feature selection on high-dimensional medical data","authors":"Yan Kang , Dongsheng Zheng , Haining Wang , Yue Peng , Shixuan Zhou","doi":"10.1016/j.swevo.2025.102118","DOIUrl":null,"url":null,"abstract":"<div><div>The high-dimensional feature selection (FS) problem is challenging in medical fields due to the “curse of dimensionality” and the intricate relationships among various features. Although hybrid FS methods achieve high-performance solutions according to various mutual information metrics, such as symmetrical uncertainty (SU) and maximal information coefficient (MIC), they often overlook the differences between these metrics, and are still need to improve search strategies to escape from local optima. To address these challenges, we propose a dual-metric guided multi-strategy hybrid FS method (DGM) for high-dimensional medical datasets. The importance of features are first evaluated based on the SU and MIC metrics, and then the redundancy between features are reduced by fast clustering and grouping strategies. Furthermore, a two-level sampling strategy is proposed to guarantee the diversity and complementarity of population by considering the Jaccard Similarity and the correlation between features. A novel set-based multi-population PSO is designed to collaboratively search the optimal feature subset while obtaining feature importance during the evaluation process by a tri-archive assisted evolution approach. Specifically, two local archives help individuals escape from local optima, while the global archive optimizes the population. Finally, we develop various squeeze-expand mechanisms to dynamically adjust both the search space and the length of individuals to effectively balance exploration and exploitation. The experimental results on 13 medical datasets show that DGM significantly improves classification performance while selecting fewer features. The T-test results further indicate that DGM significantly outperforms all comparison methods in classification performance on 10 datasets, highlighting its strong competitiveness.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"98 ","pages":"Article 102118"},"PeriodicalIF":8.5000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225002767","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The high-dimensional feature selection (FS) problem is challenging in medical fields due to the “curse of dimensionality” and the intricate relationships among various features. Although hybrid FS methods achieve high-performance solutions according to various mutual information metrics, such as symmetrical uncertainty (SU) and maximal information coefficient (MIC), they often overlook the differences between these metrics, and are still need to improve search strategies to escape from local optima. To address these challenges, we propose a dual-metric guided multi-strategy hybrid FS method (DGM) for high-dimensional medical datasets. The importance of features are first evaluated based on the SU and MIC metrics, and then the redundancy between features are reduced by fast clustering and grouping strategies. Furthermore, a two-level sampling strategy is proposed to guarantee the diversity and complementarity of population by considering the Jaccard Similarity and the correlation between features. A novel set-based multi-population PSO is designed to collaboratively search the optimal feature subset while obtaining feature importance during the evaluation process by a tri-archive assisted evolution approach. Specifically, two local archives help individuals escape from local optima, while the global archive optimizes the population. Finally, we develop various squeeze-expand mechanisms to dynamically adjust both the search space and the length of individuals to effectively balance exploration and exploitation. The experimental results on 13 medical datasets show that DGM significantly improves classification performance while selecting fewer features. The T-test results further indicate that DGM significantly outperforms all comparison methods in classification performance on 10 datasets, highlighting its strong competitiveness.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.