Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining Pub Date : 2014-06-16 DOI:10.1145/2641190.2641192

H. Kriegel, Eirini Ntoutsi

引用次数: 6

Abstract

The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.

查看原文本刊更多论文

聚类高维数据:检查子空间聚类和文本聚类之间的差异和共性-立场文件

这篇立场文件的目标是帮助人们清楚地理解子空间聚类和文本聚类之间的共性和差异。由于文本数据的高维性质和数据的稀疏性，通常将文本数据作为子空间聚类的理想选择。事实上，子空间聚类和文本聚类领域具有相似的挑战和相同的目标，即同时提取两个聚类和定义这些聚类的维度。然而，在物体特征表示、维度加权以及这些权重在不相似度计算中的结合等方面存在着根本的区别。我们试图在这两个领域之间架起桥梁，以便促进它们之间的思想交流和最佳做法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining

自引率

0.00%

发文量