Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned

2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2022-09-26 DOI:10.23919/MVA57639.2023.10215754

Ahmed Sabir

引用次数: 0

Abstract

This paper focuses on enhancing the captions generated by image captioning systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. This approach can be applied to any caption system as a post-processing method.

查看原文本刊更多论文

标题生成的词到句子视觉语义相似度:经验教训

本文的重点是对图像字幕系统生成的字幕进行增强。我们提出了一种改进字幕生成系统的方法，通过选择与图像最密切相关的输出，而不是模型产生的最可能的输出。我们的模型从视觉上下文的角度修正了语言生成输出光束搜索。我们在单词和句子级别上使用视觉语义度量来匹配图像中的相关信息和适当的标题。这种方法可以作为后处理方法应用于任何字幕系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 18th International Conference on Machine Vision and Applications (MVA)

自引率

0.00%

发文量