Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper

H. Kriegel, Eirini Ntoutsi
{"title":"Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper","authors":"H. Kriegel, Eirini Ntoutsi","doi":"10.1145/2641190.2641192","DOIUrl":null,"url":null,"abstract":"The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"15 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2641190.2641192","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2641190.2641192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.
聚类高维数据:检查子空间聚类和文本聚类之间的差异和共性-立场文件
这篇立场文件的目标是帮助人们清楚地理解子空间聚类和文本聚类之间的共性和差异。由于文本数据的高维性质和数据的稀疏性,通常将文本数据作为子空间聚类的理想选择。事实上,子空间聚类和文本聚类领域具有相似的挑战和相同的目标,即同时提取两个聚类和定义这些聚类的维度。然而,在物体特征表示、维度加权以及这些权重在不相似度计算中的结合等方面存在着根本的区别。我们试图在这两个领域之间架起桥梁,以便促进它们之间的思想交流和最佳做法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信