图像和文本匹配的语义嵌入不确定性学习

Yan Wang, Yunzhi Su, Wenhui Li, C. Yan, Bolun Zheng, Xuanya Li, Anjin Liu
{"title":"图像和文本匹配的语义嵌入不确定性学习","authors":"Yan Wang, Yunzhi Su, Wenhui Li, C. Yan, Bolun Zheng, Xuanya Li, Anjin Liu","doi":"10.1109/ICME55011.2023.00153","DOIUrl":null,"url":null,"abstract":"Image and text matching measures the semantic similarity for cross-modal retrieval. The core of this task is semantic embedding, which mines the intrinsic characteristics of visual and textual for discriminative representation. However, cross-modal ambiguity of image and text (the existence of one-to-many associations) is prone to semantic diversity. The mainstream approaches utilized the fixed point embedding to represent semantics, which ignored the embedding uncertainty caused by semantic diversity leading to incorrect results. To address this issue, we propose a novel Semantic Embedding Uncertainty Learning (SEUL), which represents the embedding uncertainty of image and text as Gaussian distributions and simultaneously learns the salient embedding (mean) and uncertainty (variance) in the common space. We design semantic uncertainty embedding for facilitating the robustness of the representation in the semantic diversity context. A combined objective function is proposed, which optimizes the semantic uncertainty and maintains discriminability to enhance cross-modal associations. Extended experiments are performed on two datasets to demonstrate advanced performance.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Embedding Uncertainty Learning for Image and Text Matching\",\"authors\":\"Yan Wang, Yunzhi Su, Wenhui Li, C. Yan, Bolun Zheng, Xuanya Li, Anjin Liu\",\"doi\":\"10.1109/ICME55011.2023.00153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image and text matching measures the semantic similarity for cross-modal retrieval. The core of this task is semantic embedding, which mines the intrinsic characteristics of visual and textual for discriminative representation. However, cross-modal ambiguity of image and text (the existence of one-to-many associations) is prone to semantic diversity. The mainstream approaches utilized the fixed point embedding to represent semantics, which ignored the embedding uncertainty caused by semantic diversity leading to incorrect results. To address this issue, we propose a novel Semantic Embedding Uncertainty Learning (SEUL), which represents the embedding uncertainty of image and text as Gaussian distributions and simultaneously learns the salient embedding (mean) and uncertainty (variance) in the common space. We design semantic uncertainty embedding for facilitating the robustness of the representation in the semantic diversity context. A combined objective function is proposed, which optimizes the semantic uncertainty and maintains discriminability to enhance cross-modal associations. Extended experiments are performed on two datasets to demonstrate advanced performance.\",\"PeriodicalId\":321830,\"journal\":{\"name\":\"2023 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME55011.2023.00153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME55011.2023.00153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

图像和文本匹配测量跨模态检索的语义相似度。该任务的核心是语义嵌入,它挖掘视觉和文本的内在特征进行判别表示。然而,图像和文本的跨模态歧义(一对多关联的存在)容易导致语义多样性。主流方法采用不动点嵌入来表示语义,忽略了语义多样性带来的嵌入不确定性,导致结果不正确。为了解决这个问题,我们提出了一种新的语义嵌入不确定性学习(SEUL),它将图像和文本的嵌入不确定性表示为高斯分布,同时学习公共空间中的显著嵌入(均值)和不确定性(方差)。我们设计了语义不确定性嵌入,以促进语义多样性环境下表示的鲁棒性。提出了一种组合目标函数,该函数优化了语义不确定性并保持了可辨别性,以增强跨模态关联。在两个数据集上进行了扩展实验,以证明先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semantic Embedding Uncertainty Learning for Image and Text Matching
Image and text matching measures the semantic similarity for cross-modal retrieval. The core of this task is semantic embedding, which mines the intrinsic characteristics of visual and textual for discriminative representation. However, cross-modal ambiguity of image and text (the existence of one-to-many associations) is prone to semantic diversity. The mainstream approaches utilized the fixed point embedding to represent semantics, which ignored the embedding uncertainty caused by semantic diversity leading to incorrect results. To address this issue, we propose a novel Semantic Embedding Uncertainty Learning (SEUL), which represents the embedding uncertainty of image and text as Gaussian distributions and simultaneously learns the salient embedding (mean) and uncertainty (variance) in the common space. We design semantic uncertainty embedding for facilitating the robustness of the representation in the semantic diversity context. A combined objective function is proposed, which optimizes the semantic uncertainty and maintains discriminability to enhance cross-modal associations. Extended experiments are performed on two datasets to demonstrate advanced performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信