Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer

2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI:10.23919/MVA57639.2023.10216160

Ching-Ching Yang, W. Chu, S. Dubey

引用次数: 0

Abstract

Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.

查看原文本刊更多论文

基于跨模态变换的弱监督深度图像哈希

由于与上下文文本或标签相关的网络图像大量存在，弱监督图像哈希算法应运而生。与图像弱相关的文本信息可以用来指导深度哈希网络的学习。本文提出了一种基于跨模态变换的弱监督深度哈希算法(WHCMT)。首先，发现图像块之间的跨尺度关注可以形成更有效的视觉表征。采用基线转换器寻找标签的自关注，形成标签表示。其次，利用所提出的跨模态转换器发现图像和标签之间的跨模态关注。然后通过嵌入层生成有效的哈希码。WHCMT在语义图像检索上进行了测试，我们展示了在MIRFLICKR-25K数据集和NUS-WIDE数据集上可以获得新的最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 18th International Conference on Machine Vision and Applications (MVA)

自引率

0.00%

发文量