2S-DFN: Dual-semantic Decoding Fusion Networks for Fine-grained Image Recognition

Pufen Zhang, Peng Shi, Song Zhang
{"title":"2S-DFN: Dual-semantic Decoding Fusion Networks for Fine-grained Image Recognition","authors":"Pufen Zhang, Peng Shi, Song Zhang","doi":"10.1109/icme55011.2023.00012","DOIUrl":null,"url":null,"abstract":"In previous fine-grained image recognition (FGIR) methods, the single global or local semantic fusion view may not be comprehensive to reveal the semantic associations between image and text. Besides, the encoding fusion strategy cannot fuse the semantics finely because the low-order text semantic dependence and the irrelevant semantic concepts are fused. To address these issues, a novel Dual-Semantic Decoding Fusion Networks (2S-DFN) is proposed for FGIR. Specifically, a multilayer text semantic encoder is first constructed to extract the higher-order semantics dependence among text. To obtain sufficient semantic association, two decoding semantic fusion streams are symmetrically designed from the global and local perspectives. Moreover, by decoding way to implant text features to semantic fusion layer as well as cascading it deeply, two streams fuse the semantics of text and image finely. Extensive experiments demonstrate that the effectiveness of the proposed method and 2S-DFN attains the state-of-the-art results on two benchmark datasets.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"12 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icme55011.2023.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In previous fine-grained image recognition (FGIR) methods, the single global or local semantic fusion view may not be comprehensive to reveal the semantic associations between image and text. Besides, the encoding fusion strategy cannot fuse the semantics finely because the low-order text semantic dependence and the irrelevant semantic concepts are fused. To address these issues, a novel Dual-Semantic Decoding Fusion Networks (2S-DFN) is proposed for FGIR. Specifically, a multilayer text semantic encoder is first constructed to extract the higher-order semantics dependence among text. To obtain sufficient semantic association, two decoding semantic fusion streams are symmetrically designed from the global and local perspectives. Moreover, by decoding way to implant text features to semantic fusion layer as well as cascading it deeply, two streams fuse the semantics of text and image finely. Extensive experiments demonstrate that the effectiveness of the proposed method and 2S-DFN attains the state-of-the-art results on two benchmark datasets.
用于细粒度图像识别的双语义解码融合网络
在以往的细粒度图像识别(FGIR)方法中,单一的全局或局部语义融合视图可能无法全面地揭示图像和文本之间的语义关联。此外,由于低阶文本语义依赖和不相关的语义概念被融合,编码融合策略不能很好地融合语义。为了解决这些问题,提出了一种新的双语义解码融合网络(2S-DFN)。具体而言,首先构建多层文本语义编码器,提取文本间的高阶语义依赖关系。为了获得充分的语义关联,从全局和局部角度对称设计了两个解码语义融合流。通过解码的方式将文本特征植入语义融合层并进行深度级联,实现了文本和图像语义的精细融合。大量的实验表明,所提出的方法和2S-DFN在两个基准数据集上取得了最先进的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信