{"title":"Towards large-scale multi-objective feature selection: A two-stage evolutionary algorithm guided by dual feature weightings","authors":"Gaohui Li , Zefeng Chen , Yuren Zhou , Zhengxin Huang , Xiaoyun Xia","doi":"10.1016/j.eswa.2025.129823","DOIUrl":null,"url":null,"abstract":"<div><div>Feature Selection (FS) is a critical task in high-dimensional data processing, aiming to identify the most discriminative subset of features to improve model performance and reduce computational complexity. In recent years, multi-objective evolutionary algorithms have been widely applied to FS problems due to their ability to simultaneously optimize multiple objectives (i.e., classification accuracy and subset size for an FS problem). However, when dealing with large-scale multi-objective FS problems, existing algorithms often suffer from the vast search space and limited search capability, which makes them prone to local optima. To address these challenges, this paper proposes a two-stage evolutionary algorithm guided by dual feature weightings, named TSEA/DFW. In the first stage, an evolutionary search is performed under the guidance of the filter-based feature weighting strategy. The key features are then identified based on the population distribution and optimal solutions, thereby shrinking the search space. In the second stage, a refined search is conducted in the shrunken feature space to boost search efficiency and solution quality. To this end, a novel weighting strategy named Pareto-based hierarchical feature weighting is proposed, which captures the variation in feature performance across different non-dominated levels, reinforces the contribution of high-quality solutions, and preserves useful information from suboptimal solutions. Additionally, a novel offspring reproduction procedure guided by stage-specific feature weights is designed to further enhance search capability. Experimental results on 13 real-world datasets show that the proposed TSEA/DFW performs best on 10 datasets in terms of HV metric and on 11 datasets in terms of IGD, demonstrating the significant superiority of TSEA/DFW over seven state-of-the-art feature selection methods. The performance improvements stem from the two-stage evolutionary framework guided by dual feature weighting, which enables the early identification of important features, thereby effectively reducing the search space and enhancing search efficiency. In addition, further analysis demonstrates that the proposed TSEA/DFW has strong generality across diverse classifiers, and the developed two-stage evolutionary framework in TSEA/DFW is a general powerful framework that can integrate any mainstream FS algorithm into its second stage, exhibiting robust applicability and scalability.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129823"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425034384","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Feature Selection (FS) is a critical task in high-dimensional data processing, aiming to identify the most discriminative subset of features to improve model performance and reduce computational complexity. In recent years, multi-objective evolutionary algorithms have been widely applied to FS problems due to their ability to simultaneously optimize multiple objectives (i.e., classification accuracy and subset size for an FS problem). However, when dealing with large-scale multi-objective FS problems, existing algorithms often suffer from the vast search space and limited search capability, which makes them prone to local optima. To address these challenges, this paper proposes a two-stage evolutionary algorithm guided by dual feature weightings, named TSEA/DFW. In the first stage, an evolutionary search is performed under the guidance of the filter-based feature weighting strategy. The key features are then identified based on the population distribution and optimal solutions, thereby shrinking the search space. In the second stage, a refined search is conducted in the shrunken feature space to boost search efficiency and solution quality. To this end, a novel weighting strategy named Pareto-based hierarchical feature weighting is proposed, which captures the variation in feature performance across different non-dominated levels, reinforces the contribution of high-quality solutions, and preserves useful information from suboptimal solutions. Additionally, a novel offspring reproduction procedure guided by stage-specific feature weights is designed to further enhance search capability. Experimental results on 13 real-world datasets show that the proposed TSEA/DFW performs best on 10 datasets in terms of HV metric and on 11 datasets in terms of IGD, demonstrating the significant superiority of TSEA/DFW over seven state-of-the-art feature selection methods. The performance improvements stem from the two-stage evolutionary framework guided by dual feature weighting, which enables the early identification of important features, thereby effectively reducing the search space and enhancing search efficiency. In addition, further analysis demonstrates that the proposed TSEA/DFW has strong generality across diverse classifiers, and the developed two-stage evolutionary framework in TSEA/DFW is a general powerful framework that can integrate any mainstream FS algorithm into its second stage, exhibiting robust applicability and scalability.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.