图像和文本匹配的语义嵌入不确定性学习

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI:10.1109/ICME55011.2023.00153

Yan Wang, Yunzhi Su, Wenhui Li, C. Yan, Bolun Zheng, Xuanya Li, Anjin Liu

{"title":"图像和文本匹配的语义嵌入不确定性学习","authors":"Yan Wang, Yunzhi Su, Wenhui Li, C. Yan, Bolun Zheng, Xuanya Li, Anjin Liu","doi":"10.1109/ICME55011.2023.00153","DOIUrl":null,"url":null,"abstract":"Image and text matching measures the semantic similarity for cross-modal retrieval. The core of this task is semantic embedding, which mines the intrinsic characteristics of visual and textual for discriminative representation. However, cross-modal ambiguity of image and text (the existence of one-to-many associations) is prone to semantic diversity. The mainstream approaches utilized the fixed point embedding to represent semantics, which ignored the embedding uncertainty caused by semantic diversity leading to incorrect results. To address this issue, we propose a novel Semantic Embedding Uncertainty Learning (SEUL), which represents the embedding uncertainty of image and text as Gaussian distributions and simultaneously learns the salient embedding (mean) and uncertainty (variance) in the common space. We design semantic uncertainty embedding for facilitating the robustness of the representation in the semantic diversity context. A combined objective function is proposed, which optimizes the semantic uncertainty and maintains discriminability to enhance cross-modal associations. Extended experiments are performed on two datasets to demonstrate advanced performance.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Embedding Uncertainty Learning for Image and Text Matching\",\"authors\":\"Yan Wang, Yunzhi Su, Wenhui Li, C. Yan, Bolun Zheng, Xuanya Li, Anjin Liu\",\"doi\":\"10.1109/ICME55011.2023.00153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image and text matching measures the semantic similarity for cross-modal retrieval. The core of this task is semantic embedding, which mines the intrinsic characteristics of visual and textual for discriminative representation. However, cross-modal ambiguity of image and text (the existence of one-to-many associations) is prone to semantic diversity. The mainstream approaches utilized the fixed point embedding to represent semantics, which ignored the embedding uncertainty caused by semantic diversity leading to incorrect results. To address this issue, we propose a novel Semantic Embedding Uncertainty Learning (SEUL), which represents the embedding uncertainty of image and text as Gaussian distributions and simultaneously learns the salient embedding (mean) and uncertainty (variance) in the common space. We design semantic uncertainty embedding for facilitating the robustness of the representation in the semantic diversity context. A combined objective function is proposed, which optimizes the semantic uncertainty and maintains discriminability to enhance cross-modal associations. Extended experiments are performed on two datasets to demonstrate advanced performance.\",\"PeriodicalId\":321830,\"journal\":{\"name\":\"2023 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME55011.2023.00153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME55011.2023.00153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

图像和文本匹配测量跨模态检索的语义相似度。该任务的核心是语义嵌入，它挖掘视觉和文本的内在特征进行判别表示。然而，图像和文本的跨模态歧义(一对多关联的存在)容易导致语义多样性。主流方法采用不动点嵌入来表示语义，忽略了语义多样性带来的嵌入不确定性，导致结果不正确。为了解决这个问题，我们提出了一种新的语义嵌入不确定性学习(SEUL)，它将图像和文本的嵌入不确定性表示为高斯分布，同时学习公共空间中的显著嵌入(均值)和不确定性(方差)。我们设计了语义不确定性嵌入，以促进语义多样性环境下表示的鲁棒性。提出了一种组合目标函数，该函数优化了语义不确定性并保持了可辨别性，以增强跨模态关联。在两个数据集上进行了扩展实验，以证明先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic Embedding Uncertainty Learning for Image and Text Matching

Image and text matching measures the semantic similarity for cross-modal retrieval. The core of this task is semantic embedding, which mines the intrinsic characteristics of visual and textual for discriminative representation. However, cross-modal ambiguity of image and text (the existence of one-to-many associations) is prone to semantic diversity. The mainstream approaches utilized the fixed point embedding to represent semantics, which ignored the embedding uncertainty caused by semantic diversity leading to incorrect results. To address this issue, we propose a novel Semantic Embedding Uncertainty Learning (SEUL), which represents the embedding uncertainty of image and text as Gaussian distributions and simultaneously learns the salient embedding (mean) and uncertainty (variance) in the common space. We design semantic uncertainty embedding for facilitating the robustness of the representation in the semantic diversity context. A combined objective function is proposed, which optimizes the semantic uncertainty and maintains discriminability to enhance cross-modal associations. Extended experiments are performed on two datasets to demonstrate advanced performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量