Boosting classification accuracy using an efficient stochastic optimization technique for feature selection in high-dimensional data

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2025-06-05 DOI:10.1016/j.swevo.2025.102025

Noureen Talpur , Shoaib-ul Hassan , Mohd Hafizul Afifi Abdullah , Abdulrahman Aminu Ghali , Ambreen Abdul Raheem , Shazia Khatoon , Norshakirah Aziz , Sivashankari Alaganandham

{"title":"Boosting classification accuracy using an efficient stochastic optimization technique for feature selection in high-dimensional data","authors":"Noureen Talpur , Shoaib-ul Hassan , Mohd Hafizul Afifi Abdullah , Abdulrahman Aminu Ghali , Ambreen Abdul Raheem , Shazia Khatoon , Norshakirah Aziz , Sivashankari Alaganandham","doi":"10.1016/j.swevo.2025.102025","DOIUrl":null,"url":null,"abstract":"<div><div>Many real-world problems involve a large number of features, among which several features are irrelevant or redundant. This problem not only increases the dimensionality but also reduces the classification performance of machine learning models. To address this issue, feature selection methods have been extensively used in the literature, either by applying existing algorithms or developing new algorithms. However, many of these approaches suffer from limitations such as insufficient feature reduction due to getting trapped in local minima in the large search space. Hence, this study proposed a recent stochastic optimization-based technique called the Osprey Optimization Algorithm (OOA). The OOA algorithm has the capability of balancing exploration and exploitation effectively during the search process, making it suitable for solving high-dimensional optimization tasks. To validate the efficiency of the selected feature subsets, the study employs the <em>k</em>-nearest neighbor (<em>k</em>-NN) classifier. Comparative results between OOA and five state-of-the-art algorithms show that OOA achieves the highest average classification accuracy of 89.22 %, while selecting the fewest average features of 70.63 and reduces the feature burden by 62.80 %. Moreover, the results of a non-parametric Wilcoxon signed-rank test based on classification accuracy show a <em>p</em>-value less than 5.00E-02, confirming a statistically significant difference in performance among the six algorithms.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"97 ","pages":"Article 102025"},"PeriodicalIF":8.2000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221065022500183X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Many real-world problems involve a large number of features, among which several features are irrelevant or redundant. This problem not only increases the dimensionality but also reduces the classification performance of machine learning models. To address this issue, feature selection methods have been extensively used in the literature, either by applying existing algorithms or developing new algorithms. However, many of these approaches suffer from limitations such as insufficient feature reduction due to getting trapped in local minima in the large search space. Hence, this study proposed a recent stochastic optimization-based technique called the Osprey Optimization Algorithm (OOA). The OOA algorithm has the capability of balancing exploration and exploitation effectively during the search process, making it suitable for solving high-dimensional optimization tasks. To validate the efficiency of the selected feature subsets, the study employs the k-nearest neighbor (k-NN) classifier. Comparative results between OOA and five state-of-the-art algorithms show that OOA achieves the highest average classification accuracy of 89.22 %, while selecting the fewest average features of 70.63 and reduces the feature burden by 62.80 %. Moreover, the results of a non-parametric Wilcoxon signed-rank test based on classification accuracy show a p-value less than 5.00E-02, confirming a statistically significant difference in performance among the six algorithms.

查看原文本刊更多论文

利用高效的随机优化技术提高高维数据的分类精度

许多现实问题涉及大量的特征，其中一些特征是不相关的或冗余的。这个问题不仅增加了机器学习模型的维数，而且降低了机器学习模型的分类性能。为了解决这个问题，特征选择方法在文献中得到了广泛的应用，要么是应用现有的算法，要么是开发新的算法。然而，这些方法中的许多都存在局限性，例如由于在大的搜索空间中被困在局部最小值中而导致特征缩减不足。因此，本研究提出了一种基于随机优化的新技术——鱼鹰优化算法（Osprey Optimization Algorithm， OOA）。OOA算法在搜索过程中具有有效平衡探索和利用的能力，适用于求解高维优化任务。为了验证所选特征子集的效率，该研究采用了k近邻（k-NN）分类器。OOA算法与5种最先进算法的对比结果表明，OOA算法的平均分类准确率最高，达到89.22%，平均特征选择最少，达到70.63个，特征负担降低62.80%。此外，基于分类精度的非参数Wilcoxon有符号秩检验的结果显示p值小于5.00E-02，证实了六种算法之间的性能差异具有统计学意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.