{"title":"聚类高维数据:检查子空间聚类和文本聚类之间的差异和共性-立场文件","authors":"H. Kriegel, Eirini Ntoutsi","doi":"10.1145/2641190.2641192","DOIUrl":null,"url":null,"abstract":"The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"15 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2641190.2641192","citationCount":"6","resultStr":"{\"title\":\"Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper\",\"authors\":\"H. Kriegel, Eirini Ntoutsi\",\"doi\":\"10.1145/2641190.2641192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.\",\"PeriodicalId\":90050,\"journal\":{\"name\":\"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining\",\"volume\":\"15 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/2641190.2641192\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2641190.2641192\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2641190.2641192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering high dimensional data: examining differences and commonalities between subspace clustering and text clustering - a position paper
The goal of this position paper is to contribute to a clear understanding of the commonalities and differences between subspace clustering and text clustering. Often text data is foisted as an ideal fit for subspace clustering due to its high dimensional nature and sparsity of the data. Indeed, the areas of subspace clustering and text clustering share similar challenges and the same goal, the simultaneous extraction of both clusters and the dimensions where these clusters are defined. However, there are fundamental differences between the two areas w.r.t object feature representation, dimension weighting and incorporation of these weights in the dissimilarity computation. We make an attempt to bridge these two domains in order to facilitate the exchange of ideas and best practices between them.