{"title":"Leveraging Transformer-based autoencoders for low-rank multi-view subspace clustering","authors":"Yuxiu Lin , Hui Liu , Xiao Yu , Caiming Zhang","doi":"10.1016/j.patcog.2024.111331","DOIUrl":null,"url":null,"abstract":"<div><div>Deep multi-view subspace clustering is a hot research topic, aiming to integrate multi-view information to produce accurate cluster prediction. Limited by the inherent heterogeneity of distinct views, existing works primarily rely on view-specific encoding structures for representation learning. Although effective to some extent, this approach may hinder the full exploitation of view information and increase the complexity of model training. To this end, this paper proposes a novel low-rank multi-view subspace clustering method, TALMSC, backed by Transformer-based autoencoders. Specifically, we extend the self-attention mechanism to multi-view clustering settings, developing multiple Transformer-based autoencoders that allow for modality-agnostic representation learning. Based on extracted latent representations, we deploy a sample-wise weighted fusion module that incorporates contrastive learning and orthogonal operators to formulate both consistency and diversity, consequently generating a comprehensive joint representation. Moreover, TALMSC involves a highly flexible low-rank regularizer under the weighted Schatten <span><math><mi>p</mi></math></span>-norm to constrain self-expression and better explore the low-rank structure. Extensive experiments on five multi-view datasets show that our method enjoys superior clustering performance over state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"161 ","pages":"Article 111331"},"PeriodicalIF":7.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324010823","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep multi-view subspace clustering is a hot research topic, aiming to integrate multi-view information to produce accurate cluster prediction. Limited by the inherent heterogeneity of distinct views, existing works primarily rely on view-specific encoding structures for representation learning. Although effective to some extent, this approach may hinder the full exploitation of view information and increase the complexity of model training. To this end, this paper proposes a novel low-rank multi-view subspace clustering method, TALMSC, backed by Transformer-based autoencoders. Specifically, we extend the self-attention mechanism to multi-view clustering settings, developing multiple Transformer-based autoencoders that allow for modality-agnostic representation learning. Based on extracted latent representations, we deploy a sample-wise weighted fusion module that incorporates contrastive learning and orthogonal operators to formulate both consistency and diversity, consequently generating a comprehensive joint representation. Moreover, TALMSC involves a highly flexible low-rank regularizer under the weighted Schatten -norm to constrain self-expression and better explore the low-rank structure. Extensive experiments on five multi-view datasets show that our method enjoys superior clustering performance over state-of-the-art methods.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.