UniBFS: A novel uniform-solution-driven binary feature selection algorithm for high-dimensional data

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2024-09-06 DOI:10.1016/j.swevo.2024.101715

Behrouz Ahadzadeh , Moloud Abdar , Mahdieh Foroumandi , Fatemeh Safara , Abbas Khosravi , Salvador García , Ponnuthurai Nagaratnam Suganthan

{"title":"UniBFS: A novel uniform-solution-driven binary feature selection algorithm for high-dimensional data","authors":"Behrouz Ahadzadeh , Moloud Abdar , Mahdieh Foroumandi , Fatemeh Safara , Abbas Khosravi , Salvador García , Ponnuthurai Nagaratnam Suganthan","doi":"10.1016/j.swevo.2024.101715","DOIUrl":null,"url":null,"abstract":"<div><p>Feature selection (FS) is a crucial technique in machine learning and data mining, serving a variety of purposes such as simplifying model construction, facilitating knowledge discovery, improving computational efficiency, and reducing memory consumption. Despite its importance, the constantly increasing search space of high-dimensional datasets poses significant challenges to FS methods, including issues like the \"curse of dimensionality,\" susceptibility to local optima, and high computational and memory costs. To overcome these challenges, a new FS algorithm named Uniform-solution-driven Binary Feature Selection (UniBFS) has been developed in this study. UniBFS exploits the inherent characteristic of binary algorithms-binary coding-to search the entire problem space for identifying relevant features while avoiding irrelevant ones. To improve the effectiveness and efficiency of the UniBFS algorithm, Redundant Features Elimination algorithm (RFE) is presented in this paper. RFE performs a local search in a very small subspace of the solutions obtained by UniBFS in different stages, and removes the redundant features which do not increase the classification accuracy. Moreover, the study proposes a hybrid algorithm that combines UniBFS with two filter-based FS methods, ReliefF and Fisher, to identify pertinent features during the global search phase. The proposed algorithms are evaluated on 30 high-dimensional datasets ranging from 2000 to 54676 dimensions, and their effectiveness and efficiency are compared with state-of-the-art techniques, demonstrating their superiority.</p></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"91 ","pages":"Article 101715"},"PeriodicalIF":8.2000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2210650224002530/pdfft?md5=8dd201c098f02846dd90beaa107d5c3f&pid=1-s2.0-S2210650224002530-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650224002530","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Feature selection (FS) is a crucial technique in machine learning and data mining, serving a variety of purposes such as simplifying model construction, facilitating knowledge discovery, improving computational efficiency, and reducing memory consumption. Despite its importance, the constantly increasing search space of high-dimensional datasets poses significant challenges to FS methods, including issues like the "curse of dimensionality," susceptibility to local optima, and high computational and memory costs. To overcome these challenges, a new FS algorithm named Uniform-solution-driven Binary Feature Selection (UniBFS) has been developed in this study. UniBFS exploits the inherent characteristic of binary algorithms-binary coding-to search the entire problem space for identifying relevant features while avoiding irrelevant ones. To improve the effectiveness and efficiency of the UniBFS algorithm, Redundant Features Elimination algorithm (RFE) is presented in this paper. RFE performs a local search in a very small subspace of the solutions obtained by UniBFS in different stages, and removes the redundant features which do not increase the classification accuracy. Moreover, the study proposes a hybrid algorithm that combines UniBFS with two filter-based FS methods, ReliefF and Fisher, to identify pertinent features during the global search phase. The proposed algorithms are evaluated on 30 high-dimensional datasets ranging from 2000 to 54676 dimensions, and their effectiveness and efficiency are compared with state-of-the-art techniques, demonstrating their superiority.

查看原文本刊更多论文

UniBFS：适用于高维数据的新型统一解驱动二元特征选择算法

特征选择（FS）是机器学习和数据挖掘中的一项重要技术，具有简化模型构建、促进知识发现、提高计算效率和减少内存消耗等多种作用。尽管其重要性不言而喻，但高维数据集不断增加的搜索空间给 FS 方法带来了巨大挑战，包括 "维度诅咒"、易出现局部最优以及计算和内存成本高等问题。为了克服这些挑战，本研究开发了一种新的 FS 算法，名为统一解决方案驱动的二元特征选择（UniBFS）。UniBFS 利用二进制算法的固有特征--二进制编码--搜索整个问题空间以识别相关特征，同时避免无关特征。为了提高 UniBFS 算法的效果和效率，本文提出了冗余特征消除算法（RFE）。RFE 在 UniBFS 不同阶段得到的解的一个很小的子空间中进行局部搜索，并去除不能提高分类精度的冗余特征。此外，研究还提出了一种混合算法，将 UniBFS 与两种基于滤波器的 FS 方法（ReliefF 和 Fisher）相结合，在全局搜索阶段识别相关特征。研究人员在 2000 到 54676 维的 30 个高维数据集上对所提出的算法进行了评估，并将其有效性和效率与最先进的技术进行了比较，从而证明了这些算法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.