A MapReduce-based approach to social network big data mining

IF 0.4 Q4 ENGINEERING, MULTIDISCIPLINARY

Journal of Computational Methods in Sciences and Engineering Pub Date : 2023-06-20 DOI:10.3233/jcm-226903

Fuli Qi

{"title":"A MapReduce-based approach to social network big data mining","authors":"Fuli Qi","doi":"10.3233/jcm-226903","DOIUrl":null,"url":null,"abstract":"The rapid development of social networks has facilitated the convenience of users to receive information. As a network communication platform for people’s daily use, microblog has countless information data. In view of the low efficiency and poor clustering effect of K-means algorithm, a parallel K-means clustering algorithm based on MapReduce model is studied; In order to alleviate the difficulty in calculating the similarity of microblog topic text, the space vector model and semantic similarity are used to calculate the similarity between texts to improve the quality of microblog text classification. The data expansion rate of corresponding nodes under different data sets shows that the average expansion rate of the parallel K-means algorithm reaches 0.89, and the running rate is the highest. The results show that the parallel K-means algorithm has good clustering stability and the highest clustering quality, reaching 1.24; The clustering time of the algorithm is the shortest, the average clustering time is 1.27 minutes, and the clustering effect and efficiency of the algorithm are the best. In the quality analysis of Weibo topic recommendation, the accuracy of P-K-means recommendation is 95.64%, user satisfaction is 98.64%, and the recommendation effect is also the best. It shows that the research on the parallel K-means clustering algorithm based on MapReduce has the best performance in microblogging topic mining and recommendation, which can efficiently recommend topics of interest to users and enhance users’ microblogging experience.","PeriodicalId":45004,"journal":{"name":"Journal of Computational Methods in Sciences and Engineering","volume":"23 1","pages":"2535-2547"},"PeriodicalIF":0.4000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Methods in Sciences and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jcm-226903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid development of social networks has facilitated the convenience of users to receive information. As a network communication platform for people’s daily use, microblog has countless information data. In view of the low efficiency and poor clustering effect of K-means algorithm, a parallel K-means clustering algorithm based on MapReduce model is studied; In order to alleviate the difficulty in calculating the similarity of microblog topic text, the space vector model and semantic similarity are used to calculate the similarity between texts to improve the quality of microblog text classification. The data expansion rate of corresponding nodes under different data sets shows that the average expansion rate of the parallel K-means algorithm reaches 0.89, and the running rate is the highest. The results show that the parallel K-means algorithm has good clustering stability and the highest clustering quality, reaching 1.24; The clustering time of the algorithm is the shortest, the average clustering time is 1.27 minutes, and the clustering effect and efficiency of the algorithm are the best. In the quality analysis of Weibo topic recommendation, the accuracy of P-K-means recommendation is 95.64%, user satisfaction is 98.64%, and the recommendation effect is also the best. It shows that the research on the parallel K-means clustering algorithm based on MapReduce has the best performance in microblogging topic mining and recommendation, which can efficiently recommend topics of interest to users and enhance users’ microblogging experience.

查看原文本刊更多论文

基于mapreduce的社交网络大数据挖掘方法

社交网络的快速发展为用户接收信息提供了便利。微博作为人们日常使用的网络交流平台，拥有无数的信息数据。针对K-means算法效率低、聚类效果差的问题，研究了一种基于MapReduce模型的并行K-means聚类算法;为了缓解微博话题文本相似度计算的困难，本文采用空间向量模型和语义相似度来计算文本之间的相似度，以提高微博文本分类质量。不同数据集下对应节点的数据扩展率表明，并行K-means算法的平均扩展率达到0.89，运行率最高。结果表明，并行K-means算法聚类稳定性好，聚类质量最高，达到1.24;该算法的聚类时间最短，平均聚类时间为1.27分钟，算法的聚类效果和效率最好。在微博话题推荐的质量分析中，P-K-means推荐的准确率为95.64%，用户满意度为98.64%，推荐效果也最好。研究结果表明，基于MapReduce的并行k均值聚类算法在微博话题挖掘和推荐方面表现最好，能够高效地向用户推荐感兴趣的话题，提升用户的微博体验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computational Methods in Sciences and Engineering ENGINEERING, MULTIDISCIPLINARY-

CiteScore

0.80

自引率

0.00%

发文量

152

期刊介绍： The major goal of the Journal of Computational Methods in Sciences and Engineering (JCMSE) is the publication of new research results on computational methods in sciences and engineering. Common experience had taught us that computational methods originally developed in a given basic science, e.g. physics, can be of paramount importance to other neighboring sciences, e.g. chemistry, as well as to engineering or technology and, in turn, to society as a whole. This undoubtedly beneficial practice of interdisciplinary interactions will be continuously and systematically encouraged by the JCMSE.