A K-means clustering with optimized initial center based on Hadoop platform

2014 9th International Conference on Computer Science & Education Pub Date : 2014-10-16 DOI:10.1109/ICCSE.2014.6926466

Kunhui Lin, Xiang Li, Zhongnan Zhang, Jiahong Chen

引用次数: 17

Abstract

With the explosive growth of data, the traditional clustering algorithms running on separate servers can not meet the demand. To solve the problem, more and more researchers implement the traditional clustering algorithms on the cloud computing platforms, especially for K-means clustering. But, few researchers pay attention to the K-means clustering structure, and most of researchers optimized the model of the cloud computing platform to raise the computing speed of K-means clustering. However the problem of instability caused by the random initial centers still exists. In this paper, we propose a K-means clustering algorithm with optimized initial centers based on data dimensional density. This method avoids the deficiency of the random initial centers and improves the stability of the K-means clustering. The experimental results show that the approach achieves a good performance on K-means, and improves the accuracy of K-means clustering on the test set.

查看原文本刊更多论文

基于Hadoop平台优化初始中心的K-means聚类

随着数据量的爆炸式增长，传统的在独立服务器上运行的聚类算法已经不能满足需求。为了解决这一问题，越来越多的研究者在云计算平台上实现了传统的聚类算法，特别是K-means聚类算法。但是，很少有研究人员关注K-means聚类结构，大多数研究人员对云计算平台的模型进行优化，以提高K-means聚类的计算速度。但是随机初始中心引起的不稳定性问题仍然存在。本文提出了一种基于数据维密度优化初始中心的k均值聚类算法。该方法避免了随机初始中心的不足，提高了K-means聚类的稳定性。实验结果表明，该方法在K-means上取得了较好的性能，提高了K-means聚类在测试集上的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 9th International Conference on Computer Science & Education

自引率

0.00%

发文量