基于Hadoop MapReduce改进K-Means聚类算法的MapReduce模型

2016 Second International Conference on Computational Intelligence & Communication Technology (CICT) Pub Date : 2016-02-01 DOI:10.1109/CICT.2016.46

N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad

{"title":"基于Hadoop MapReduce改进K-Means聚类算法的MapReduce模型","authors":"N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad","doi":"10.1109/CICT.2016.46","DOIUrl":null,"url":null,"abstract":"In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.","PeriodicalId":118509,"journal":{"name":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce\",\"authors\":\"N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad\",\"doi\":\"10.1109/CICT.2016.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.\",\"PeriodicalId\":118509,\"journal\":{\"name\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICT.2016.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICT.2016.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

在当今的数字世界中，数字数据的输入和输出速度比以往任何时候都要快。除非我们从中提取出一些有用的内容，否则这些数据是没有用的。但是，在大数据上使用传统的数据库管理技术是不切实际的，效率低下的。这就是像Hadoop这样的大数据技术出现的原因。Hadoop是一个开源框架，可以用来并行处理海量数据。为了提取有用的信息，可以使用数据挖掘技术。在众多的数据挖掘技术中，聚类是最受欢迎的技术。聚类将相似的数据聚在一起，而不相似的数据分散在不同的组中。K均值聚类算法是聚类技术的一种。传统的K均值聚类试图将n个数据对象分配到K个随机初始中心的聚类中。实验表明，如果使用随机初始中心，数据挖掘结果是低效且不稳定的。本文利用改进的初始中心对传统的K均值聚类算法进行了改进。我们提出了各种计算初始中心的方法，并比较了它们的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce

In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)

自引率

0.00%

发文量