A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI:10.1109/ICDM.2002.1183901

Andrew Foss, Osmar R Zaiane

引用次数: 53

Abstract

Clustering is the problem of grouping data based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The problem Of clustering data sets is also known as unsupervised classification, since no class labels are given. However, all existing clustering algorithms require some parameters to steer the clustering process, such as the famous k for the number of expected clusters, which constitutes a supervision of a sort. We present in this paper a new, efficient, fast and scalable clustering algorithm that clusters over a range of resolutions and finds a potential optimum clustering without requiring any parameter input. Our experiments show that our algorithm outperforms most existing clustering algorithms in quality and speed for large data sets.

查看原文本刊更多论文

一种有效发现大数据集任意形状聚类的无参数方法

聚类是基于相似度对数据进行分组的问题，包括最大化组内相似度和最小化组间相似度。聚类数据集的问题也被称为无监督分类，因为没有给出类标签。然而，所有现有的聚类算法都需要一些参数来引导聚类过程，例如著名的k表示期望聚类的数量，它构成了对排序的监督。本文提出了一种新的、高效、快速和可扩展的聚类算法，该算法在一系列分辨率上聚类，并在不需要任何参数输入的情况下找到潜在的最佳聚类。我们的实验表明，我们的算法在大数据集的质量和速度上优于大多数现有的聚类算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2002 IEEE International Conference on Data Mining, 2002. Proceedings.

自引率

0.00%

发文量