基于Hadoop MapReduce改进K-Means聚类算法的MapReduce模型

N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad
{"title":"基于Hadoop MapReduce改进K-Means聚类算法的MapReduce模型","authors":"N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad","doi":"10.1109/CICT.2016.46","DOIUrl":null,"url":null,"abstract":"In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.","PeriodicalId":118509,"journal":{"name":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce\",\"authors\":\"N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad\",\"doi\":\"10.1109/CICT.2016.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.\",\"PeriodicalId\":118509,\"journal\":{\"name\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICT.2016.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICT.2016.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

在当今的数字世界中,数字数据的输入和输出速度比以往任何时候都要快。除非我们从中提取出一些有用的内容,否则这些数据是没有用的。但是,在大数据上使用传统的数据库管理技术是不切实际的,效率低下的。这就是像Hadoop这样的大数据技术出现的原因。Hadoop是一个开源框架,可以用来并行处理海量数据。为了提取有用的信息,可以使用数据挖掘技术。在众多的数据挖掘技术中,聚类是最受欢迎的技术。聚类将相似的数据聚在一起,而不相似的数据分散在不同的组中。K均值聚类算法是聚类技术的一种。传统的K均值聚类试图将n个数据对象分配到K个随机初始中心的聚类中。实验表明,如果使用随机初始中心,数据挖掘结果是低效且不稳定的。本文利用改进的初始中心对传统的K均值聚类算法进行了改进。我们提出了各种计算初始中心的方法,并比较了它们的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信