2S-DFN: Dual-semantic Decoding Fusion Networks for Fine-grained Image Recognition

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI:10.1109/icme55011.2023.00012

Pufen Zhang, Peng Shi, Song Zhang

引用次数: 0

Abstract

In previous fine-grained image recognition (FGIR) methods, the single global or local semantic fusion view may not be comprehensive to reveal the semantic associations between image and text. Besides, the encoding fusion strategy cannot fuse the semantics finely because the low-order text semantic dependence and the irrelevant semantic concepts are fused. To address these issues, a novel Dual-Semantic Decoding Fusion Networks (2S-DFN) is proposed for FGIR. Specifically, a multilayer text semantic encoder is first constructed to extract the higher-order semantics dependence among text. To obtain sufficient semantic association, two decoding semantic fusion streams are symmetrically designed from the global and local perspectives. Moreover, by decoding way to implant text features to semantic fusion layer as well as cascading it deeply, two streams fuse the semantics of text and image finely. Extensive experiments demonstrate that the effectiveness of the proposed method and 2S-DFN attains the state-of-the-art results on two benchmark datasets.

查看原文本刊更多论文

用于细粒度图像识别的双语义解码融合网络

在以往的细粒度图像识别(FGIR)方法中，单一的全局或局部语义融合视图可能无法全面地揭示图像和文本之间的语义关联。此外，由于低阶文本语义依赖和不相关的语义概念被融合，编码融合策略不能很好地融合语义。为了解决这些问题，提出了一种新的双语义解码融合网络(2S-DFN)。具体而言，首先构建多层文本语义编码器，提取文本间的高阶语义依赖关系。为了获得充分的语义关联，从全局和局部角度对称设计了两个解码语义融合流。通过解码的方式将文本特征植入语义融合层并进行深度级联，实现了文本和图像语义的精细融合。大量的实验表明，所提出的方法和2S-DFN在两个基准数据集上取得了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量