A self-adjusting representation-based multitask PSO for high-dimensional feature selection

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2025-07-23 DOI:10.1016/j.swevo.2025.102084

Li Deng , Xiaohui Su , Bo Wei

{"title":"A self-adjusting representation-based multitask PSO for high-dimensional feature selection","authors":"Li Deng , Xiaohui Su , Bo Wei","doi":"10.1016/j.swevo.2025.102084","DOIUrl":null,"url":null,"abstract":"<div><div>As a critical preprocessing step in machine learning tasks, feature selection (FS) aims to identify informative features from the original datasets. However, FS is commonly formulated as an NP-hard combinatorial optimization problem, particularly when compounded by exponentially expanding search spaces and complex feature interactions. Due to its simplicity of implementation, Particle Swarm Optimization (PSO) has been extensively utilized in FS tasks. Concurrently, the optimization process frequently converges to local optima when handling high-dimensional (<span><math><mo>></mo></math></span>1000D) datasets. To address this issue, a self-adjusting representation-based multitask PSO (SAR-MTPSO) is proposed in this paper. Firstly, the knee point strategy and an elite feature-preserving strategy are employed to obtain promising particles with key features. Based on these particles, a new multitask framework is introduced, where two tasks are constructed by using a self-adjusting representation strategy. Secondly, a two-layer knowledge transfer strategy is proposed to promote the useful information sharing and exchanging between two tasks dynamically. Finally, an adaptive re-initialization strategy is proposed to enhance the exploitation and exploration capabilities of the two tasks respectively. SAR-MTPSO was compared with 10 representative FS algorithms on 21 high-dimensional datasets. Experimental results show that SAR-MTPSO can achieve the highest classification accuracies with smaller sizes of feature subsets on 17 out of 21 datasets.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"98 ","pages":"Article 102084"},"PeriodicalIF":8.2000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225002421","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As a critical preprocessing step in machine learning tasks, feature selection (FS) aims to identify informative features from the original datasets. However, FS is commonly formulated as an NP-hard combinatorial optimization problem, particularly when compounded by exponentially expanding search spaces and complex feature interactions. Due to its simplicity of implementation, Particle Swarm Optimization (PSO) has been extensively utilized in FS tasks. Concurrently, the optimization process frequently converges to local optima when handling high-dimensional (

>

1000D) datasets. To address this issue, a self-adjusting representation-based multitask PSO (SAR-MTPSO) is proposed in this paper. Firstly, the knee point strategy and an elite feature-preserving strategy are employed to obtain promising particles with key features. Based on these particles, a new multitask framework is introduced, where two tasks are constructed by using a self-adjusting representation strategy. Secondly, a two-layer knowledge transfer strategy is proposed to promote the useful information sharing and exchanging between two tasks dynamically. Finally, an adaptive re-initialization strategy is proposed to enhance the exploitation and exploration capabilities of the two tasks respectively. SAR-MTPSO was compared with 10 representative FS algorithms on 21 high-dimensional datasets. Experimental results show that SAR-MTPSO can achieve the highest classification accuracies with smaller sizes of feature subsets on 17 out of 21 datasets.

查看原文本刊更多论文

基于自调整表示的多任务粒子群高维特征选择

作为机器学习任务的关键预处理步骤，特征选择（FS）旨在从原始数据集中识别信息特征。然而，FS通常被表述为NP-hard组合优化问题，特别是在指数扩展搜索空间和复杂特征交互的情况下。粒子群优化算法（PSO）由于其简单易行的特点，在FS任务中得到了广泛的应用。同时，在处理高维（>1000D）数据集时，优化过程经常收敛于局部最优。为了解决这一问题，本文提出了一种基于自调整表示的多任务粒子群算法（SAR-MTPSO）。首先，采用膝点策略和精英特征保持策略获取具有关键特征的有希望粒子；在此基础上，引入了一种新的多任务框架，其中使用自调整表示策略构造两个任务。其次，提出了一种两层知识转移策略，以促进任务间有用信息的动态共享和交换。最后，提出了一种自适应重新初始化策略，以提高两个任务的开发和探测能力。在21个高维数据集上比较了SAR-MTPSO与10种代表性FS算法。实验结果表明，在21个数据集中的17个数据集上，SAR-MTPSO在特征子集较小的情况下可以获得最高的分类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.