基于双向图卷积的子空间共聚类

Proceedings of the 31st ACM International Conference on Information & Knowledge Management Pub Date : 2022-10-17 DOI:10.1145/3511808.3557706

Chakib Fettal, Lazhar Labiod, M. Nadif

{"title":"基于双向图卷积的子空间共聚类","authors":"Chakib Fettal, Lazhar Labiod, M. Nadif","doi":"10.1145/3511808.3557706","DOIUrl":null,"url":null,"abstract":"Subspace clustering aims to cluster high dimensional data lying in a union of low-dimensional subspaces. It has shown good results on the task of image clustering but text clustering, using document-term matrices, proved more impervious to advances based on this approach. We hypothesize that this is because, compared to image data, text data is generally higher dimensional and sparser. This renders subspace clustering impractical in such a context. Here, we leverage subspace clustering for text by addressing these issues. We first extend the concept of subspace clustering to co-clustering, which has been extensively used on document-term matrices due to the resulting interplay between the document and term representations. We then address the sparsity problem through a two-way graph convolution, which promotes the grouping effect that has been credited for the effectiveness of some subspace clustering models. The proposed formulation results in an algorithm that is efficient both in terms of computational and spatial complexity. We show the competitiveness of our model w.r.t the state-of-the-art on document-term attributed graph datasets in terms of performance and efficiency.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Subspace Co-clustering with Two-Way Graph Convolution\",\"authors\":\"Chakib Fettal, Lazhar Labiod, M. Nadif\",\"doi\":\"10.1145/3511808.3557706\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Subspace clustering aims to cluster high dimensional data lying in a union of low-dimensional subspaces. It has shown good results on the task of image clustering but text clustering, using document-term matrices, proved more impervious to advances based on this approach. We hypothesize that this is because, compared to image data, text data is generally higher dimensional and sparser. This renders subspace clustering impractical in such a context. Here, we leverage subspace clustering for text by addressing these issues. We first extend the concept of subspace clustering to co-clustering, which has been extensively used on document-term matrices due to the resulting interplay between the document and term representations. We then address the sparsity problem through a two-way graph convolution, which promotes the grouping effect that has been credited for the effectiveness of some subspace clustering models. The proposed formulation results in an algorithm that is efficient both in terms of computational and spatial complexity. We show the competitiveness of our model w.r.t the state-of-the-art on document-term attributed graph datasets in terms of performance and efficiency.\",\"PeriodicalId\":389624,\"journal\":{\"name\":\"Proceedings of the 31st ACM International Conference on Information & Knowledge Management\",\"volume\":\"356 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 31st ACM International Conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3511808.3557706\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511808.3557706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

子空间聚类的目的是将高维数据聚在低维子空间的并集中。它在图像聚类任务上显示出良好的结果，但使用文档术语矩阵的文本聚类被证明更不受基于该方法的进展的影响。我们假设这是因为，与图像数据相比，文本数据通常是更高维度和更稀疏的。这使得子空间聚类在这种上下文中不切实际。这里，我们通过解决这些问题来利用文本的子空间聚类。我们首先将子空间聚类的概念扩展到共聚类，由于文档和术语表示之间的相互作用，它已广泛用于文档-术语矩阵。然后，我们通过双向图卷积来解决稀疏性问题，这促进了分组效果，这被认为是一些子空间聚类模型的有效性。所提出的公式产生的算法在计算和空间复杂度方面都是有效的。我们在性能和效率方面展示了我们的模型在文档术语属性图数据集上的竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Subspace Co-clustering with Two-Way Graph Convolution

Subspace clustering aims to cluster high dimensional data lying in a union of low-dimensional subspaces. It has shown good results on the task of image clustering but text clustering, using document-term matrices, proved more impervious to advances based on this approach. We hypothesize that this is because, compared to image data, text data is generally higher dimensional and sparser. This renders subspace clustering impractical in such a context. Here, we leverage subspace clustering for text by addressing these issues. We first extend the concept of subspace clustering to co-clustering, which has been extensively used on document-term matrices due to the resulting interplay between the document and term representations. We then address the sparsity problem through a two-way graph convolution, which promotes the grouping effect that has been credited for the effectiveness of some subspace clustering models. The proposed formulation results in an algorithm that is efficient both in terms of computational and spatial complexity. We show the competitiveness of our model w.r.t the state-of-the-art on document-term attributed graph datasets in terms of performance and efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

自引率

0.00%

发文量