Big data clustering using fractional sail fish-sparse fuzzy C-means and particle whale optimization based MapReduce framework

Web Intell. Pub Date : 2022-07-20 DOI:10.3233/web-210490

Omkaresh Kulkarni, Ravi Sankar Vadali

{"title":"Big data clustering using fractional sail fish-sparse fuzzy C-means and particle whale optimization based MapReduce framework","authors":"Omkaresh Kulkarni, Ravi Sankar Vadali","doi":"10.3233/web-210490","DOIUrl":null,"url":null,"abstract":"The process of retrieving essential information from the dataset is a significant data mining approach, which is specifically termed as data clustering. However, nature-inspired optimizations are designed in recent decades to solve optimization problems, particularly for data clustering complexities. However, the existing methods are not feasible to process with a large amount of data, as the execution time taken by the traditional approaches is larger. Hence, an efficient and optimal data clustering scheme is designed using the devised Fractional Sail Fish-Sparse Fuzzy C-Means + Particle Whale optimization (FSF-Sparse FCM + PWO) based MapReduce Framework (MRF) to process high dimensional data. Theproposed FSF-Sparse FCM is designed by the integration of Sail Fish Optimization (SFO) with fractional concept and Sparse FCM. The proposed MRF poses two functions, such as the mapper function and reducer function to perform the process of data clustering. Moreover, the proposed FSF-Sparse FCM is employed in the mapper phase to compute the cluster centroids, and thereby the intermediate data is generated. The intermediate data is tuned in the reducer phase using Particle Whale Optimization (PWO), which is the integration of Particle Swarm Optimization (PSO) and Whale optimization algorithm (WOA). Accordingly, the optimal cluster centroid is computed at the reducer phase using the objective function based on DB-Index. The proposed FSF-Sparse FM + PWO obtained the highest accuracy of 0.903 and lowest DB-Index of 39.07.","PeriodicalId":245783,"journal":{"name":"Web Intell.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Web Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/web-210490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The process of retrieving essential information from the dataset is a significant data mining approach, which is specifically termed as data clustering. However, nature-inspired optimizations are designed in recent decades to solve optimization problems, particularly for data clustering complexities. However, the existing methods are not feasible to process with a large amount of data, as the execution time taken by the traditional approaches is larger. Hence, an efficient and optimal data clustering scheme is designed using the devised Fractional Sail Fish-Sparse Fuzzy C-Means + Particle Whale optimization (FSF-Sparse FCM + PWO) based MapReduce Framework (MRF) to process high dimensional data. Theproposed FSF-Sparse FCM is designed by the integration of Sail Fish Optimization (SFO) with fractional concept and Sparse FCM. The proposed MRF poses two functions, such as the mapper function and reducer function to perform the process of data clustering. Moreover, the proposed FSF-Sparse FCM is employed in the mapper phase to compute the cluster centroids, and thereby the intermediate data is generated. The intermediate data is tuned in the reducer phase using Particle Whale Optimization (PWO), which is the integration of Particle Swarm Optimization (PSO) and Whale optimization algorithm (WOA). Accordingly, the optimal cluster centroid is computed at the reducer phase using the objective function based on DB-Index. The proposed FSF-Sparse FM + PWO obtained the highest accuracy of 0.903 and lowest DB-Index of 39.07.

查看原文本刊更多论文

基于分数帆鱼稀疏模糊c均值和粒子鲸优化的MapReduce框架的大数据聚类

从数据集中检索基本信息的过程是一种重要的数据挖掘方法，具体称为数据聚类。然而，近几十年来，自然启发的优化被设计用于解决优化问题，特别是数据聚类的复杂性。但是，传统方法的执行时间较大，不适合处理大数据量的数据。因此，采用设计的分数帆鱼-稀疏模糊c均值+粒子鲸优化(FSF-Sparse FCM + pw) MapReduce框架(MRF)设计了一种高效、最优的数据聚类方案来处理高维数据。本文提出的fsf -稀疏FCM是将带有分数概念的帆鱼优化(SFO)与稀疏FCM相结合而设计的。本文提出了两个函数，即mapper函数和reducer函数来完成数据聚类过程。在映射阶段采用FSF-Sparse FCM算法计算聚类质心，从而生成中间数据。在减速机阶段使用粒子鲸优化算法(PSO)对中间数据进行调优，该算法是粒子群优化算法(PSO)和鲸鱼优化算法(WOA)的结合。据此，利用基于DB-Index的目标函数，在减速阶段计算最优聚类质心。所提出的FSF-Sparse FM + ppo的精度最高为0.903,DB-Index最低为39.07。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Web Intell.

自引率

0.00%

发文量