Towards large-scale multi-objective feature selection: A two-stage evolutionary algorithm guided by dual feature weightings

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Gaohui Li , Zefeng Chen , Yuren Zhou , Zhengxin Huang , Xiaoyun Xia
{"title":"Towards large-scale multi-objective feature selection: A two-stage evolutionary algorithm guided by dual feature weightings","authors":"Gaohui Li ,&nbsp;Zefeng Chen ,&nbsp;Yuren Zhou ,&nbsp;Zhengxin Huang ,&nbsp;Xiaoyun Xia","doi":"10.1016/j.eswa.2025.129823","DOIUrl":null,"url":null,"abstract":"<div><div>Feature Selection (FS) is a critical task in high-dimensional data processing, aiming to identify the most discriminative subset of features to improve model performance and reduce computational complexity. In recent years, multi-objective evolutionary algorithms have been widely applied to FS problems due to their ability to simultaneously optimize multiple objectives (i.e., classification accuracy and subset size for an FS problem). However, when dealing with large-scale multi-objective FS problems, existing algorithms often suffer from the vast search space and limited search capability, which makes them prone to local optima. To address these challenges, this paper proposes a two-stage evolutionary algorithm guided by dual feature weightings, named TSEA/DFW. In the first stage, an evolutionary search is performed under the guidance of the filter-based feature weighting strategy. The key features are then identified based on the population distribution and optimal solutions, thereby shrinking the search space. In the second stage, a refined search is conducted in the shrunken feature space to boost search efficiency and solution quality. To this end, a novel weighting strategy named Pareto-based hierarchical feature weighting is proposed, which captures the variation in feature performance across different non-dominated levels, reinforces the contribution of high-quality solutions, and preserves useful information from suboptimal solutions. Additionally, a novel offspring reproduction procedure guided by stage-specific feature weights is designed to further enhance search capability. Experimental results on 13 real-world datasets show that the proposed TSEA/DFW performs best on 10 datasets in terms of HV metric and on 11 datasets in terms of IGD, demonstrating the significant superiority of TSEA/DFW over seven state-of-the-art feature selection methods. The performance improvements stem from the two-stage evolutionary framework guided by dual feature weighting, which enables the early identification of important features, thereby effectively reducing the search space and enhancing search efficiency. In addition, further analysis demonstrates that the proposed TSEA/DFW has strong generality across diverse classifiers, and the developed two-stage evolutionary framework in TSEA/DFW is a general powerful framework that can integrate any mainstream FS algorithm into its second stage, exhibiting robust applicability and scalability.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129823"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425034384","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Feature Selection (FS) is a critical task in high-dimensional data processing, aiming to identify the most discriminative subset of features to improve model performance and reduce computational complexity. In recent years, multi-objective evolutionary algorithms have been widely applied to FS problems due to their ability to simultaneously optimize multiple objectives (i.e., classification accuracy and subset size for an FS problem). However, when dealing with large-scale multi-objective FS problems, existing algorithms often suffer from the vast search space and limited search capability, which makes them prone to local optima. To address these challenges, this paper proposes a two-stage evolutionary algorithm guided by dual feature weightings, named TSEA/DFW. In the first stage, an evolutionary search is performed under the guidance of the filter-based feature weighting strategy. The key features are then identified based on the population distribution and optimal solutions, thereby shrinking the search space. In the second stage, a refined search is conducted in the shrunken feature space to boost search efficiency and solution quality. To this end, a novel weighting strategy named Pareto-based hierarchical feature weighting is proposed, which captures the variation in feature performance across different non-dominated levels, reinforces the contribution of high-quality solutions, and preserves useful information from suboptimal solutions. Additionally, a novel offspring reproduction procedure guided by stage-specific feature weights is designed to further enhance search capability. Experimental results on 13 real-world datasets show that the proposed TSEA/DFW performs best on 10 datasets in terms of HV metric and on 11 datasets in terms of IGD, demonstrating the significant superiority of TSEA/DFW over seven state-of-the-art feature selection methods. The performance improvements stem from the two-stage evolutionary framework guided by dual feature weighting, which enables the early identification of important features, thereby effectively reducing the search space and enhancing search efficiency. In addition, further analysis demonstrates that the proposed TSEA/DFW has strong generality across diverse classifiers, and the developed two-stage evolutionary framework in TSEA/DFW is a general powerful framework that can integrate any mainstream FS algorithm into its second stage, exhibiting robust applicability and scalability.
面向大规模多目标特征选择:双特征权重指导下的两阶段进化算法
特征选择(Feature Selection, FS)是高维数据处理中的一项关键任务,旨在识别最具判别性的特征子集,以提高模型性能和降低计算复杂度。近年来,多目标进化算法由于能够同时优化多个目标(即一个FS问题的分类精度和子集大小)而被广泛应用于FS问题。然而,在处理大规模多目标FS问题时,现有算法往往存在搜索空间大、搜索能力有限的问题,容易出现局部最优。为了解决这些问题,本文提出了一种以双特征权重为指导的两阶段进化算法,称为TSEA/DFW。第一阶段,在基于过滤器的特征加权策略的指导下进行进化搜索。然后根据种群分布和最优解识别关键特征,从而缩小搜索空间。第二阶段,在缩小的特征空间中进行精细化搜索,提高搜索效率和解的质量。为此,提出了一种新的加权策略,即基于pareto的分层特征加权,该策略捕获了不同非支配水平下特征性能的变化,增强了高质量解的贡献,并保留了次优解的有用信息。此外,设计了一种新的基于阶段特征权重的子代繁殖过程,以进一步提高搜索能力。在13个真实数据集上的实验结果表明,提出的TSEA/DFW在10个数据集的HV度量和11个数据集的IGD方面表现最好,表明TSEA/DFW在7种最先进的特征选择方法中具有显著的优势。性能的提高源于双特征加权指导下的两阶段进化框架,该框架能够早期识别重要特征,从而有效地减少搜索空间,提高搜索效率。此外,进一步分析表明,所提出的TSEA/DFW在不同分类器之间具有较强的通用性,并且在TSEA/DFW中开发的两阶段进化框架是一个通用的强大框架,可以将任何主流的FS算法集成到其第二阶段,具有较强的适用性和可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信