基于聚类算法与 MAPREDUCE 结合的大规模数据挖掘方法

Transactions on Computer Science and Intelligent Systems Research Pub Date : 2023-12-21 DOI:10.62051/8p9b3106

Yulun Zhang, Chenxu Zhang, Lei Yang, Hongyang Li

{"title":"基于聚类算法与 MAPREDUCE 结合的大规模数据挖掘方法","authors":"Yulun Zhang, Chenxu Zhang, Lei Yang, Hongyang Li","doi":"10.62051/8p9b3106","DOIUrl":null,"url":null,"abstract":"With the continuous deepening and development of information technology, the diversity and amount of information in data continue to grow. Effectively mining these text data to extract valuable content has become an urgent task in the field of data research. This study combines the MapReduce distributed system with the K-means clustering algorithm to meet the challenges of large-scale data mining. At the same time, the paper use a distributed caching mechanism to solve the problem of repeated application of resources for multiple MapReduce collaborative operations and improve data mining efficiency. The combination of MapReduce's distributed computing and the advantages of K-means clustering algorithm provides an efficient and scalable method for large-scale data mining. Experimental results combining internal and external indicators show that the advantage of combining K-means with MapReduce is to fully utilize the distributed and parallel computing characteristics of MapReduce, providing users with an efficient and scalable data mining tool. Through this research, the paper provide new methods and insights for large-scale data mining, improving the efficiency and accuracy of data mining.","PeriodicalId":509968,"journal":{"name":"Transactions on Computer Science and Intelligent Systems Research","volume":"48 24","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large-scale Data Mining Method based on Clustering Algorithm Combined with MAPREDUCE\",\"authors\":\"Yulun Zhang, Chenxu Zhang, Lei Yang, Hongyang Li\",\"doi\":\"10.62051/8p9b3106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the continuous deepening and development of information technology, the diversity and amount of information in data continue to grow. Effectively mining these text data to extract valuable content has become an urgent task in the field of data research. This study combines the MapReduce distributed system with the K-means clustering algorithm to meet the challenges of large-scale data mining. At the same time, the paper use a distributed caching mechanism to solve the problem of repeated application of resources for multiple MapReduce collaborative operations and improve data mining efficiency. The combination of MapReduce's distributed computing and the advantages of K-means clustering algorithm provides an efficient and scalable method for large-scale data mining. Experimental results combining internal and external indicators show that the advantage of combining K-means with MapReduce is to fully utilize the distributed and parallel computing characteristics of MapReduce, providing users with an efficient and scalable data mining tool. Through this research, the paper provide new methods and insights for large-scale data mining, improving the efficiency and accuracy of data mining.\",\"PeriodicalId\":509968,\"journal\":{\"name\":\"Transactions on Computer Science and Intelligent Systems Research\",\"volume\":\"48 24\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions on Computer Science and Intelligent Systems Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.62051/8p9b3106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions on Computer Science and Intelligent Systems Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62051/8p9b3106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着信息技术的不断深入和发展，数据的多样性和信息量不断增长。有效挖掘这些文本数据以提取有价值的内容已成为数据研究领域的一项紧迫任务。本研究将 MapReduce 分布式系统与 K-means 聚类算法相结合，以应对大规模数据挖掘的挑战。同时，本文利用分布式缓存机制，解决了多个 MapReduce 协同操作重复应用资源的问题，提高了数据挖掘效率。MapReduce 的分布式计算与 K-means 聚类算法的优势相结合，为大规模数据挖掘提供了一种高效、可扩展的方法。结合内外部指标的实验结果表明，K-means与MapReduce结合的优势在于充分发挥了MapReduce的分布式和并行计算特性，为用户提供了高效、可扩展的数据挖掘工具。通过这项研究，论文为大规模数据挖掘提供了新的方法和见解，提高了数据挖掘的效率和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large-scale Data Mining Method based on Clustering Algorithm Combined with MAPREDUCE

With the continuous deepening and development of information technology, the diversity and amount of information in data continue to grow. Effectively mining these text data to extract valuable content has become an urgent task in the field of data research. This study combines the MapReduce distributed system with the K-means clustering algorithm to meet the challenges of large-scale data mining. At the same time, the paper use a distributed caching mechanism to solve the problem of repeated application of resources for multiple MapReduce collaborative operations and improve data mining efficiency. The combination of MapReduce's distributed computing and the advantages of K-means clustering algorithm provides an efficient and scalable method for large-scale data mining. Experimental results combining internal and external indicators show that the advantage of combining K-means with MapReduce is to fully utilize the distributed and parallel computing characteristics of MapReduce, providing users with an efficient and scalable data mining tool. Through this research, the paper provide new methods and insights for large-scale data mining, improving the efficiency and accuracy of data mining.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transactions on Computer Science and Intelligent Systems Research

自引率

0.00%

发文量