A novel importance-guided particle swarm optimization based on MLP for solving large-scale feature selection problems

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2024-11-04 DOI:10.1016/j.swevo.2024.101760

Yu Xue, Chenyi Zhang

{"title":"A novel importance-guided particle swarm optimization based on MLP for solving large-scale feature selection problems","authors":"Yu Xue, Chenyi Zhang","doi":"10.1016/j.swevo.2024.101760","DOIUrl":null,"url":null,"abstract":"<div><div>Feature selection is a crucial data preprocessing technique that effectively reduces the dataset size and enhances the performance of machine learning models. Evolutionary computation (EC) based feature selection has become one of the most important parts of feature selection methods. However, the performance of existing EC methods significantly decrease when dealing with datasets with thousands of dimensions. To address this issue, this paper proposes a novel method called importance-guided particle swarm optimization based on MLP (IGPSO) for feature selection. IGPSO utilizes a two stage trained neural network to learn a feature importance vector, which is then used as a guiding factor for population initialization and evolution. In the two stage of learning, the positive samples are used to learn the importance of useful features while the negative samples are used to identify the invalid features. Then the importance vector is generated combining the two category information. Finally, it is used to replace the acceleration factors and inertia weight in original binary PSO, which makes the individual acceleration factor and social acceleration factor are positively correlated with the importance values, while the inertia weight is negatively correlated with the importance value. Further more, IGPSO uses the flip probability to update the individuals. Experimental results on 24 datasets demonstrate that compared to other state-of-the-art algorithms, IGPSO can significantly reduce the number of features while maintaining satisfactory classification accuracy, thus achieving high-quality feature selection effects. In particular, compared with other state-of-the-art algorithms, there is an average reduction of 0.1 in the fitness value and an average increase of 6.7% in classification accuracy on large-scale datasets.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"91 ","pages":"Article 101760"},"PeriodicalIF":8.2000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650224002980","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Feature selection is a crucial data preprocessing technique that effectively reduces the dataset size and enhances the performance of machine learning models. Evolutionary computation (EC) based feature selection has become one of the most important parts of feature selection methods. However, the performance of existing EC methods significantly decrease when dealing with datasets with thousands of dimensions. To address this issue, this paper proposes a novel method called importance-guided particle swarm optimization based on MLP (IGPSO) for feature selection. IGPSO utilizes a two stage trained neural network to learn a feature importance vector, which is then used as a guiding factor for population initialization and evolution. In the two stage of learning, the positive samples are used to learn the importance of useful features while the negative samples are used to identify the invalid features. Then the importance vector is generated combining the two category information. Finally, it is used to replace the acceleration factors and inertia weight in original binary PSO, which makes the individual acceleration factor and social acceleration factor are positively correlated with the importance values, while the inertia weight is negatively correlated with the importance value. Further more, IGPSO uses the flip probability to update the individuals. Experimental results on 24 datasets demonstrate that compared to other state-of-the-art algorithms, IGPSO can significantly reduce the number of features while maintaining satisfactory classification accuracy, thus achieving high-quality feature selection effects. In particular, compared with other state-of-the-art algorithms, there is an average reduction of 0.1 in the fitness value and an average increase of 6.7% in classification accuracy on large-scale datasets.

查看原文本刊更多论文

基于 MLP 的新型重要性引导粒子群优化技术，用于解决大规模特征选择问题

特征选择是一种重要的数据预处理技术，它能有效减少数据集的大小并提高机器学习模型的性能。基于进化计算（EC）的特征选择已成为特征选择方法中最重要的部分之一。然而，现有的进化计算方法在处理数千维度的数据集时性能明显下降。为了解决这个问题，本文提出了一种名为基于 MLP 的重要性引导粒子群优化（IGPSO）的新方法来进行特征选择。IGPSO 利用经过两阶段训练的神经网络来学习特征重要性向量，然后将其作为种群初始化和进化的指导因素。在两阶段学习中，正样本用于学习有用特征的重要性，而负样本用于识别无效特征。然后，结合两个类别的信息生成重要性向量。最后，用它来替换原始二元 PSO 中的加速因子和惯性权重，使得个体加速因子和社会加速因子与重要性值呈正相关，而惯性权重与重要性值呈负相关。此外，IGPSO 还使用翻转概率来更新个体。在 24 个数据集上的实验结果表明，与其他最先进的算法相比，IGPSO 可以在保持令人满意的分类准确性的同时显著减少特征数量，从而达到高质量的特征选择效果。特别是在大规模数据集上，与其他最先进的算法相比，适应度值平均降低了 0.1，分类准确率平均提高了 6.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.