A decentralized metaheuristic approach to feature selection inspired by social interactions within a societal framework, for handling datasets of diverse sizes
IF 4.2 3区 计算机科学Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"A decentralized metaheuristic approach to feature selection inspired by social interactions within a societal framework, for handling datasets of diverse sizes","authors":"Sobia Tariq Javed , Kashif Zafar , Irfan Younas","doi":"10.1016/j.bdr.2025.100556","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid advancement of technology has led to the generation of big data. This vast and diverse data can uncover valuable patterns and yield promising results when effectively mined, processed, and analyzed. However, it also introduces the “curse of dimensionality,” which can negatively impact the performance of machine learning models. Feature Selection (FS) is a data preprocessing technique aimed at identifying the optimal feature set to enhance model efficiency and reduce processing time. Numerous metaheuristic wrapper-based FS techniques have been explored in the literature. However, a significant drawback of many of these algorithms is their dependence on centralized learning, where the global best solution drives the search direction. This centralized approach is risky, as any error by the global best can hinder the exploration and exploitation of other potential areas, leading to inaccuracies in discovering the true global optimum. In this paper, the binary variant of a novel decentralized metaheuristic Kids Learning Optimization Algorithm (KLO) called <strong>Binary Kids Learning Optimization Algorithm (BKLO)</strong> is proposed for optimal feature selection for classification purposes in wrapper mode. The continuous solutions of KLO are converted to binary space by using the transfer function. A comparison is provided between the two transfer functions: hyperbolic tan (V-shaped) and the Sigmoidal (S-shaped) transfer functions. BKLO is compared with seven state-of-the-art algorithms. The performance of algorithms is evaluated and compared using several assessment indicators over fifteen benchmark datasets with a wide range of dimensions (small, medium, and large) from the University of California Irvine (UCI) repository and Arizona State University. The superiority of BKLO in reducing the number of features with increased classification accuracy over the other competing algorithms is demonstrated through the experiments and Friedman's Mean Rank (FMR) statistical tests.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100556"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579625000516","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid advancement of technology has led to the generation of big data. This vast and diverse data can uncover valuable patterns and yield promising results when effectively mined, processed, and analyzed. However, it also introduces the “curse of dimensionality,” which can negatively impact the performance of machine learning models. Feature Selection (FS) is a data preprocessing technique aimed at identifying the optimal feature set to enhance model efficiency and reduce processing time. Numerous metaheuristic wrapper-based FS techniques have been explored in the literature. However, a significant drawback of many of these algorithms is their dependence on centralized learning, where the global best solution drives the search direction. This centralized approach is risky, as any error by the global best can hinder the exploration and exploitation of other potential areas, leading to inaccuracies in discovering the true global optimum. In this paper, the binary variant of a novel decentralized metaheuristic Kids Learning Optimization Algorithm (KLO) called Binary Kids Learning Optimization Algorithm (BKLO) is proposed for optimal feature selection for classification purposes in wrapper mode. The continuous solutions of KLO are converted to binary space by using the transfer function. A comparison is provided between the two transfer functions: hyperbolic tan (V-shaped) and the Sigmoidal (S-shaped) transfer functions. BKLO is compared with seven state-of-the-art algorithms. The performance of algorithms is evaluated and compared using several assessment indicators over fifteen benchmark datasets with a wide range of dimensions (small, medium, and large) from the University of California Irvine (UCI) repository and Arizona State University. The superiority of BKLO in reducing the number of features with increased classification accuracy over the other competing algorithms is demonstrated through the experiments and Friedman's Mean Rank (FMR) statistical tests.
科技的飞速发展导致了大数据的产生。这些庞大而多样的数据可以发现有价值的模式,并在有效地挖掘、处理和分析时产生有希望的结果。然而,它也引入了“维度诅咒”,这可能会对机器学习模型的性能产生负面影响。特征选择(FS)是一种旨在识别最优特征集以提高模型效率和减少处理时间的数据预处理技术。许多基于元启发式包装的FS技术已经在文献中进行了探索。然而,许多这些算法的一个重大缺点是它们依赖于集中学习,其中全局最优解驱动搜索方向。这种集中的方法是有风险的,因为全局最优的任何错误都可能阻碍对其他潜在区域的探索和开发,从而导致发现真正的全局最优的不准确性。本文提出了一种新的去中心化元启发式儿童学习优化算法(KLO)的二进制变体,称为二进制儿童学习优化算法(BKLO),用于在包装器模式下进行分类目的的最优特征选择。利用传递函数将KLO的连续解转换为二进制空间。比较了两种传递函数:双曲tan (v形)和s形(s形)传递函数。BKLO与7种最先进的算法进行了比较。算法的性能通过来自加州大学欧文分校(UCI)存储库和亚利桑那州立大学的15个具有广泛维度(小、中、大)的基准数据集的几个评估指标进行评估和比较。通过实验和Friedman's Mean Rank (FMR)统计检验,证明了BKLO在减少特征数量和提高分类精度方面优于其他竞争算法。
期刊介绍:
The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic.
The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.