Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning

Proceedings of the 30th ACM International Conference on Multimedia Pub Date : 2022-10-10 DOI:10.1145/3503161.3548391

M. Liang, Junping Du, Xiaowen Cao, Yang Yu, Kangkang Lu, Zhe Xue, Min Zhang

{"title":"Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning","authors":"M. Liang, Junping Du, Xiaowen Cao, Yang Yu, Kangkang Lu, Zhe Xue, Min Zhang","doi":"10.1145/3503161.3548391","DOIUrl":null,"url":null,"abstract":"Deep cross-media hashing technology provides an efficient cross-media representation learning solution for cross-media search. However, the existing methods do not consider both fine-grained semantic features and semantic structures to mine implicit cross-media semantic associations, which leads to weaker semantic discrimination and consistency for cross-media representation. To tackle this problem, we propose a novel semantic structure enhanced contrastive adversarial hash network for cross-media representation learning (SCAHN). Firstly, in order to capture more fine-grained cross-media semantic associations, a fine-grained cross-media attention feature learning network is constructed, thus the learned saliency features of different modalities are more conducive to cross-media semantic alignment and fusion. Secondly, for further improving learning ability of implicit cross-media semantic associations, a semantic label association graph is constructed, and the graph convolutional network is utilized to mine the implicit semantic structures, thus guiding learning of discriminative features of different modalities. Thirdly, a cross-media and intra-media contrastive adversarial representation learning mechanism is proposed to further enhance the semantic discriminativeness of different modal representations, and a dual-way adversarial learning strategy is developed to maximize cross-media semantic associations, so as to obtain cross-media unified representations with stronger discriminativeness and semantic consistency preserving power. Extensive experiments on several cross-media benchmark datasets demonstrate that the proposed SCAHN outperforms the state-of-the-art methods.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Deep cross-media hashing technology provides an efficient cross-media representation learning solution for cross-media search. However, the existing methods do not consider both fine-grained semantic features and semantic structures to mine implicit cross-media semantic associations, which leads to weaker semantic discrimination and consistency for cross-media representation. To tackle this problem, we propose a novel semantic structure enhanced contrastive adversarial hash network for cross-media representation learning (SCAHN). Firstly, in order to capture more fine-grained cross-media semantic associations, a fine-grained cross-media attention feature learning network is constructed, thus the learned saliency features of different modalities are more conducive to cross-media semantic alignment and fusion. Secondly, for further improving learning ability of implicit cross-media semantic associations, a semantic label association graph is constructed, and the graph convolutional network is utilized to mine the implicit semantic structures, thus guiding learning of discriminative features of different modalities. Thirdly, a cross-media and intra-media contrastive adversarial representation learning mechanism is proposed to further enhance the semantic discriminativeness of different modal representations, and a dual-way adversarial learning strategy is developed to maximize cross-media semantic associations, so as to obtain cross-media unified representations with stronger discriminativeness and semantic consistency preserving power. Extensive experiments on several cross-media benchmark datasets demonstrate that the proposed SCAHN outperforms the state-of-the-art methods.

查看原文本刊更多论文

跨媒体表示学习的语义结构增强对比对抗哈希网络

深度跨媒体哈希技术为跨媒体搜索提供了一种高效的跨媒体表示学习解决方案。然而，现有的方法没有同时考虑细粒度的语义特征和语义结构来挖掘隐含的跨媒体语义关联，导致跨媒体表示的语义辨别力和一致性较弱。为了解决这个问题，我们提出了一种新的语义结构增强的对比对抗哈希网络，用于跨媒体表示学习(SCAHN)。首先，为了捕获更细粒度的跨媒体语义关联，构建了一个细粒度的跨媒体注意特征学习网络，使学习到的不同模态的显著性特征更有利于跨媒体语义对齐和融合。其次，为进一步提高内隐跨媒体语义关联的学习能力，构建语义标签关联图，利用图卷积网络对内隐语义结构进行挖掘，从而指导不同模态的判别特征学习。再次，提出跨媒体和媒体内对比对抗表征学习机制，进一步增强不同模态表征的语义辨别性，并制定双向对抗学习策略，最大化跨媒体语义关联，从而获得具有更强辨别性和语义一致性保持能力的跨媒体统一表征。在几个跨媒体基准数据集上进行的大量实验表明，所提出的SCAHN优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 30th ACM International Conference on Multimedia

自引率

0.00%

发文量