Fast k-means Clustering Based on the Neighbor Information

2021 International Symposium on Electrical, Electronics and Information Engineering Pub Date : 2021-02-19 DOI:10.1145/3459104.3459194

Daowan Peng, Zizhong Chen, Jingcheng Fu, Shuyin Xia, Qing Wen

引用次数: 8

Abstract

The k-means algorithm has been widely used in the last several decades, but the efficiency of Lloyd's k-means algorithm drops sharply in dealing with large-scale data scenarios. To solve this problem, this paper proposes a fast k-means algorithm based on neighbor information. Firstly, we propose a localization strategy in the reassignment step of k-means. Through this strategy, the scale of distance calculation is greatly reduced. Secondly, we propose the neighbor update strategy. In such a way, more accurate neighbors for each cluster could be found in each iteration, thereby ensuring the clustering quality when the k-means algorithm converges. The proposed k-means algorithm was evaluated on multiple real-world datasets and increased the speed up to hundreds of times while only losing about 1.10% of the clustering result quality.

查看原文本刊更多论文

基于邻居信息的快速k-means聚类

在过去的几十年里，k-means算法得到了广泛的应用，但在处理大规模数据场景时，Lloyd的k-means算法的效率急剧下降。为了解决这一问题，本文提出了一种基于邻居信息的快速k-means算法。首先，我们提出了k-means重分配步骤的定位策略。通过这种策略，可以大大减少距离计算的规模。其次，提出了邻居更新策略。这样可以在每次迭代中为每个聚类找到更精确的邻居，从而保证k-means算法收敛时的聚类质量。本文提出的k-means算法在多个真实数据集上进行了评估，在仅损失约1.10%聚类结果质量的情况下，将聚类速度提高了数百倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Symposium on Electrical, Electronics and Information Engineering

自引率

0.00%

发文量