White shark beetle optimizer enabled deep fuzzy clustering for feature selection and big data clustering in MapReduce framework

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, THEORY & METHODS

Fuzzy Sets and Systems Pub Date : 2025-07-05 DOI:10.1016/j.fss.2025.109536

Omkaresh Kulkarni , Chitrakant Banchhor , V. Ravi Sankar

{"title":"White shark beetle optimizer enabled deep fuzzy clustering for feature selection and big data clustering in MapReduce framework","authors":"Omkaresh Kulkarni , Chitrakant Banchhor , V. Ravi Sankar","doi":"10.1016/j.fss.2025.109536","DOIUrl":null,"url":null,"abstract":"<div><div>Big data analytics have gained substantial attention over traditional data-processing methods, as they excel in uncovering hidden patterns and correlations within massive datasets, commonly referred to as big data. Advancements in information technology and the rapid expansion of the web have significantly increased the volume of data generated and utilized in everyday life. Moreover, traditional methods often struggle with efficiency and accuracy. These challenges are crucial to address, as the era of big data is transforming various domains, from research to real-world applications, where accurate analysis is critical. This study introduces White Shark Beetle Optimizer + Deep Fuzzy Clustering (WSBO+DFC), an innovative method designed to efficiently process and analyze big data. At the beginning, the input big data is retrieved from the database and transmitted to the MapReduce framework for processing. The MapReduce architecture has two phases, namely the mapper phase and the reducer phase. In the mapper phase, key-value pairs are generated from the dataset, providing structure to the previously unstructured data. This phase consists of multiple mappers, where feature selection is performed using Support Vector Machine (SVM) and Recursive Feature Elimination (SVM-RFE). To optimize the weight parameters of SVM, the proposed White Shark Beetle Optimizer (WSBO) is employed. Alternatively, in a reduced phase, the entire selected features are merged. After that, the fused features are subjected to big data clustering, which is conducted by utilizing Deep Fuzzy Clustering (DFC). The weight update process within DFC is guided by the WSBO, which is developed through the integration of the White Shark Optimizer (WSO) and the Dung Beetle Optimizer (DBO). The developed method achieved a Maximum accuracy of 89.87% and a maximum DB Index of 0.995.</div></div>","PeriodicalId":55130,"journal":{"name":"Fuzzy Sets and Systems","volume":"519 ","pages":"Article 109536"},"PeriodicalIF":2.7000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fuzzy Sets and Systems","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165011425002751","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Big data analytics have gained substantial attention over traditional data-processing methods, as they excel in uncovering hidden patterns and correlations within massive datasets, commonly referred to as big data. Advancements in information technology and the rapid expansion of the web have significantly increased the volume of data generated and utilized in everyday life. Moreover, traditional methods often struggle with efficiency and accuracy. These challenges are crucial to address, as the era of big data is transforming various domains, from research to real-world applications, where accurate analysis is critical. This study introduces White Shark Beetle Optimizer + Deep Fuzzy Clustering (WSBO+DFC), an innovative method designed to efficiently process and analyze big data. At the beginning, the input big data is retrieved from the database and transmitted to the MapReduce framework for processing. The MapReduce architecture has two phases, namely the mapper phase and the reducer phase. In the mapper phase, key-value pairs are generated from the dataset, providing structure to the previously unstructured data. This phase consists of multiple mappers, where feature selection is performed using Support Vector Machine (SVM) and Recursive Feature Elimination (SVM-RFE). To optimize the weight parameters of SVM, the proposed White Shark Beetle Optimizer (WSBO) is employed. Alternatively, in a reduced phase, the entire selected features are merged. After that, the fused features are subjected to big data clustering, which is conducted by utilizing Deep Fuzzy Clustering (DFC). The weight update process within DFC is guided by the WSBO, which is developed through the integration of the White Shark Optimizer (WSO) and the Dung Beetle Optimizer (DBO). The developed method achieved a Maximum accuracy of 89.87% and a maximum DB Index of 0.995.

查看原文本刊更多论文

白鲨甲虫优化器在MapReduce框架中实现了深度模糊聚类，用于特征选择和大数据聚类

与传统的数据处理方法相比，大数据分析已经获得了大量关注，因为它们擅长于发现海量数据集（通常被称为大数据）中的隐藏模式和相关性。信息技术的进步和网络的迅速扩张大大增加了日常生活中产生和利用的数据量。此外，传统方法往往在效率和准确性方面存在问题。由于大数据时代正在改变各个领域，从研究到现实世界的应用，准确的分析至关重要，因此应对这些挑战至关重要。本研究引入了一种创新的大数据处理和分析方法——大白鲨甲虫优化器+深度模糊聚类（WSBO+DFC）。一开始，输入的大数据从数据库中检索出来，传输到MapReduce框架进行处理。MapReduce架构有两个阶段，即mapper阶段和reducer阶段。在映射器阶段，从数据集生成键值对，为之前的非结构化数据提供结构。该阶段由多个映射器组成，其中使用支持向量机（SVM）和递归特征消除（SVM- rfe）进行特征选择。为了优化支持向量机的权重参数，采用了提出的白鲨甲虫优化器（WSBO）。或者，在简化阶段，将所有选定的特征合并。然后，利用深度模糊聚类（Deep Fuzzy clustering， DFC）对融合后的特征进行大数据聚类。DFC中的权重更新过程由WSBO指导，WSBO是通过整合白鲨优化器（WSO）和屎壳郎优化器（DBO）开发的。该方法的准确度为89.87%，DB指数为0.995。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Fuzzy Sets and Systems 数学-计算机：理论方法

CiteScore

6.50

自引率

17.90%

发文量

321

审稿时长

6.1 months

期刊介绍： Since its launching in 1978, the journal Fuzzy Sets and Systems has been devoted to the international advancement of the theory and application of fuzzy sets and systems. The theory of fuzzy sets now encompasses a well organized corpus of basic notions including (and not restricted to) aggregation operations, a generalized theory of relations, specific measures of information content, a calculus of fuzzy numbers. Fuzzy sets are also the cornerstone of a non-additive uncertainty theory, namely possibility theory, and of a versatile tool for both linguistic and numerical modeling: fuzzy rule-based systems. Numerous works now combine fuzzy concepts with other scientific disciplines as well as modern technologies. In mathematics fuzzy sets have triggered new research topics in connection with category theory, topology, algebra, analysis. Fuzzy sets are also part of a recent trend in the study of generalized measures and integrals, and are combined with statistical methods. Furthermore, fuzzy sets have strong logical underpinnings in the tradition of many-valued logics.