Inter-Modality Fusion Based Attention for Zero-Shot Cross-Modal Retrieval

Bela Chakraborty, Peng Wang, Lei Wang
{"title":"Inter-Modality Fusion Based Attention for Zero-Shot Cross-Modal Retrieval","authors":"Bela Chakraborty, Peng Wang, Lei Wang","doi":"10.1109/ICIP42928.2021.9506182","DOIUrl":null,"url":null,"abstract":"Zero-shot cross-modal retrieval (ZS-CMR) performs the task of cross-modal retrieval where the classes of test categories have a different scope than the training categories. It borrows the intuition from zero-shot learning which targets to transfer the knowledge inferred during the training phase for seen classes to the testing phase for unseen classes. It mimics the real-world scenario where new object categories are continuously populating the multi-media data corpus. Unlike existing ZS-CMR approaches which use generative adversarial networks (GANs) to generate more data, we propose Inter-Modality Fusion based Attention (IMFA) and a framework ZS_INN_FUSE (Zero-Shot cross-modal retrieval using INNer product with image-text FUSEd). It exploits the rich semantics of textual data as guidance to infer additional knowledge during the training phase. This is achieved by generating attention weights through the fusion of image and text modalities to focus on the important regions in an image. We carefully create a zero-shot split based on the large-scale MS-COCO and Flickr30k datasets to perform experiments. The results show that our method achieves improvement over the ZS-CMR baseline and self-attention mechanism, demonstrating the effectiveness of inter-modality fusion in a zero-shot scenario.","PeriodicalId":314429,"journal":{"name":"2021 IEEE International Conference on Image Processing (ICIP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP42928.2021.9506182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Zero-shot cross-modal retrieval (ZS-CMR) performs the task of cross-modal retrieval where the classes of test categories have a different scope than the training categories. It borrows the intuition from zero-shot learning which targets to transfer the knowledge inferred during the training phase for seen classes to the testing phase for unseen classes. It mimics the real-world scenario where new object categories are continuously populating the multi-media data corpus. Unlike existing ZS-CMR approaches which use generative adversarial networks (GANs) to generate more data, we propose Inter-Modality Fusion based Attention (IMFA) and a framework ZS_INN_FUSE (Zero-Shot cross-modal retrieval using INNer product with image-text FUSEd). It exploits the rich semantics of textual data as guidance to infer additional knowledge during the training phase. This is achieved by generating attention weights through the fusion of image and text modalities to focus on the important regions in an image. We carefully create a zero-shot split based on the large-scale MS-COCO and Flickr30k datasets to perform experiments. The results show that our method achieves improvement over the ZS-CMR baseline and self-attention mechanism, demonstrating the effectiveness of inter-modality fusion in a zero-shot scenario.
基于多模态融合的零弹跨模态检索
零射击跨模态检索(Zero-shot cross-modal retrieval, ZS-CMR)在测试类别的类与训练类别具有不同范围的情况下执行跨模态检索任务。它借鉴了零射击学习的直觉,目标是将在训练阶段对可见类推断的知识转移到未见类的测试阶段。它模拟了现实世界的场景,其中新的对象类别不断填充多媒体数据语料库。与现有的使用生成对抗网络(gan)生成更多数据的ZS-CMR方法不同,我们提出了基于跨模态融合的注意力(IMFA)和框架ZS_INN_FUSE(使用图像-文本融合的内部产品进行零shot跨模态检索)。它利用文本数据的丰富语义作为指导,在训练阶段推断额外的知识。这是通过融合图像和文本模式来产生关注权重来实现的,以关注图像中的重要区域。我们在MS-COCO和Flickr30k大规模数据集的基础上,精心创建了一个零镜头分割来进行实验。结果表明,该方法对ZS-CMR基线和自注意机制进行了改进,证明了零弹场景下多模态融合的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信