Bo Xu , Guoxu Li , Jie Wang , Zheng Wang , Jianfu Cao , Rong Wang , Feiping Nie
{"title":"多模态对比融合的动态t分布随机邻居图卷积网络","authors":"Bo Xu , Guoxu Li , Jie Wang , Zheng Wang , Jianfu Cao , Rong Wang , Feiping Nie","doi":"10.1016/j.neucom.2025.130950","DOIUrl":null,"url":null,"abstract":"<div><div>As the continuous advancement of data acquisition technologies progresses, multi-modal data have emerged as a prominent focus in various domains. This paper aims to tackle critical challenges in the multi-modal fusion process, specifically in representation learning, modal consistency invariance learning, and model diversity complementarity learning, by employing graph convolutional networks and contrastive learning methods. Current GCN-based methods generally depend on predefined graphs for representation learning, limiting their capacity to capture local and global information effectively. Furthermore, some current models do not adequately compare the representations of consistency and diversity across different modalities during the fusion procedure. To address the identified challenges, we propose a novel T-distributed Stochastic Neighbor Contrastive Graph Convolutional Network (TSNGCN). It consists of the adaptive static graph learning module, the multi-modal representation learning module, and the multi-modal contrastive fusion module. The adaptive static graph learning module constructs graphs without relying on any predefined distance metrics, which creates a pairwise graph adaptively to preserve the local structure of general data. Moreover, a loss function based on T-distributed stochastic neighbor embedding is designed to learn the transformation between the embeddings and the original data, thus facilitating the exploration of more discriminative information within the learned subspace. In addition, the proposed multi-modal contrastive fusion module effectively maximizes the similarity of the same samples across different modalities while ensuring the distinction of dissimilar samples, thereby enhancing the model’s consistency objective. Extensive experiments conducted on several multi-modal benchmark datasets demonstrate the superiority and effectiveness of TSNGCN compared to existing methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 130950"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic T-distributed stochastic neighbor graph convolutional networks for multi-modal contrastive fusion\",\"authors\":\"Bo Xu , Guoxu Li , Jie Wang , Zheng Wang , Jianfu Cao , Rong Wang , Feiping Nie\",\"doi\":\"10.1016/j.neucom.2025.130950\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As the continuous advancement of data acquisition technologies progresses, multi-modal data have emerged as a prominent focus in various domains. This paper aims to tackle critical challenges in the multi-modal fusion process, specifically in representation learning, modal consistency invariance learning, and model diversity complementarity learning, by employing graph convolutional networks and contrastive learning methods. Current GCN-based methods generally depend on predefined graphs for representation learning, limiting their capacity to capture local and global information effectively. Furthermore, some current models do not adequately compare the representations of consistency and diversity across different modalities during the fusion procedure. To address the identified challenges, we propose a novel T-distributed Stochastic Neighbor Contrastive Graph Convolutional Network (TSNGCN). It consists of the adaptive static graph learning module, the multi-modal representation learning module, and the multi-modal contrastive fusion module. The adaptive static graph learning module constructs graphs without relying on any predefined distance metrics, which creates a pairwise graph adaptively to preserve the local structure of general data. Moreover, a loss function based on T-distributed stochastic neighbor embedding is designed to learn the transformation between the embeddings and the original data, thus facilitating the exploration of more discriminative information within the learned subspace. In addition, the proposed multi-modal contrastive fusion module effectively maximizes the similarity of the same samples across different modalities while ensuring the distinction of dissimilar samples, thereby enhancing the model’s consistency objective. Extensive experiments conducted on several multi-modal benchmark datasets demonstrate the superiority and effectiveness of TSNGCN compared to existing methods.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"652 \",\"pages\":\"Article 130950\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225016224\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016224","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
As the continuous advancement of data acquisition technologies progresses, multi-modal data have emerged as a prominent focus in various domains. This paper aims to tackle critical challenges in the multi-modal fusion process, specifically in representation learning, modal consistency invariance learning, and model diversity complementarity learning, by employing graph convolutional networks and contrastive learning methods. Current GCN-based methods generally depend on predefined graphs for representation learning, limiting their capacity to capture local and global information effectively. Furthermore, some current models do not adequately compare the representations of consistency and diversity across different modalities during the fusion procedure. To address the identified challenges, we propose a novel T-distributed Stochastic Neighbor Contrastive Graph Convolutional Network (TSNGCN). It consists of the adaptive static graph learning module, the multi-modal representation learning module, and the multi-modal contrastive fusion module. The adaptive static graph learning module constructs graphs without relying on any predefined distance metrics, which creates a pairwise graph adaptively to preserve the local structure of general data. Moreover, a loss function based on T-distributed stochastic neighbor embedding is designed to learn the transformation between the embeddings and the original data, thus facilitating the exploration of more discriminative information within the learned subspace. In addition, the proposed multi-modal contrastive fusion module effectively maximizes the similarity of the same samples across different modalities while ensuring the distinction of dissimilar samples, thereby enhancing the model’s consistency objective. Extensive experiments conducted on several multi-modal benchmark datasets demonstrate the superiority and effectiveness of TSNGCN compared to existing methods.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.