{"title":"Csan: cross-coupled semantic adversarial network for cross-modal retrieval","authors":"Zhuoyi Li, Huibin Lu, Hao Fu, Fanzhen Meng, Guanghua Gu","doi":"10.1007/s10462-025-11152-7","DOIUrl":null,"url":null,"abstract":"<div><p>Cross-modal retrieval aims to correlate multimedia data by bridging the heterogeneity gap. Most cross-modal retrieval approaches learn a common subspace to project the multimedia data into the subspace for directly measuring the similarity. However, the existing cross-modal retrieval frameworks cannot fully capture the semantic consistency in the limited supervision information. In this paper, we propose a Cross-coupled Semantic Adversarial Network (CSAN) for cross-modal retrieval. The main structure of this approach is mainly composed of the generative adversarial network, i.e., each modality branch is equipped with a generator and a discriminator. Besides, a cross-coupled semantic architecture is designed to fully explore the correlation of paired heterogeneous samples. To be specific, we couple a forward branch with an inverse mapping and implement a weight-sharing strategy of the inverse mapping branch to the branch of another modality. Furthermore, a cross-coupled consistency loss is introduced to minimize the semantic gap between the representations of the inverse mapping branch and the forward branch. Extensive qualitative and quantitative experiments are conducted to evaluate the performance of the proposed approach. By comparing against the previous works, the experiment results demonstrate our approach outperforms state-of-the-art works.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 5","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11152-7.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11152-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-modal retrieval aims to correlate multimedia data by bridging the heterogeneity gap. Most cross-modal retrieval approaches learn a common subspace to project the multimedia data into the subspace for directly measuring the similarity. However, the existing cross-modal retrieval frameworks cannot fully capture the semantic consistency in the limited supervision information. In this paper, we propose a Cross-coupled Semantic Adversarial Network (CSAN) for cross-modal retrieval. The main structure of this approach is mainly composed of the generative adversarial network, i.e., each modality branch is equipped with a generator and a discriminator. Besides, a cross-coupled semantic architecture is designed to fully explore the correlation of paired heterogeneous samples. To be specific, we couple a forward branch with an inverse mapping and implement a weight-sharing strategy of the inverse mapping branch to the branch of another modality. Furthermore, a cross-coupled consistency loss is introduced to minimize the semantic gap between the representations of the inverse mapping branch and the forward branch. Extensive qualitative and quantitative experiments are conducted to evaluate the performance of the proposed approach. By comparing against the previous works, the experiment results demonstrate our approach outperforms state-of-the-art works.
期刊介绍:
Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.