SPARCL: Efficient and Effective Shape-Based Clustering

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI:10.1109/ICDM.2008.73

V. Chaoji, M. Hasan, Saeed Salem, Mohammed J. Zaki

{"title":"SPARCL: Efficient and Effective Shape-Based Clustering","authors":"V. Chaoji, M. Hasan, Saeed Salem, Mohammed J. Zaki","doi":"10.1109/ICDM.2008.73","DOIUrl":null,"url":null,"abstract":"Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages - the first stage runs a carefully initialized version of the K-means algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Eighth IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2008.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

Clustering is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on. The focus of this paper is on full-dimensional, arbitrary shaped clusters. Existing methods for this problem suffer either in terms of the memory or time complexity (quadratic or even cubic). This shortcoming has restricted these algorithms to datasets of moderate sizes. In this paper we propose SPARCL, a simple and scalable algorithm for finding clusters with arbitrary shapes and sizes, and it has linear space and time complexity. SPARCL consists of two stages - the first stage runs a carefully initialized version of the K-means algorithm to generate many small seed clusters. The second stage iteratively merges the generated clusters to obtain the final shape-based clusters. Experiments were conducted on a variety of datasets to highlight the effectiveness, efficiency, and scalability of our approach. On the large datasets SPARCL is an order of magnitude faster than the best existing approaches.

查看原文本刊更多论文

SPARCL:高效的基于形状的聚类

聚类是数据挖掘的基本任务之一。近年来出现了许多不同的聚类范式，包括分区聚类、分层聚类、基于混合模型的聚类、基于密度的聚类、谱聚类、子空间聚类等。本文的重点是全维、任意形状的簇。现有的方法在内存或时间复杂度(二次甚至三次)方面都存在问题。这一缺点限制了这些算法适用于中等规模的数据集。本文提出了一种简单、可扩展的算法SPARCL，用于寻找任意形状和大小的簇，它具有线性的空间和时间复杂度。SPARCL由两个阶段组成——第一阶段运行一个精心初始化的K-means算法，以生成许多小的种子簇。第二阶段迭代合并生成的聚类以获得最终的基于形状的聚类。在各种数据集上进行了实验，以突出我们的方法的有效性，效率和可扩展性。在大型数据集上，SPARCL比现有的最佳方法快一个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 Eighth IEEE International Conference on Data Mining

自引率

0.00%

发文量