Incorporating the Graph Representation of Video and Text into Video Captioning

Min Lu, Yuan Li
{"title":"Incorporating the Graph Representation of Video and Text into Video Captioning","authors":"Min Lu, Yuan Li","doi":"10.1109/ICTAI56018.2022.00065","DOIUrl":null,"url":null,"abstract":"Video captioning is to translate the video content into the textual descriptions. In the encoding phase, the existing approaches encode the irrelevant background and uncorrelated visual objects into visual features. That leads to semantic aberration between the visual features and the expected textual caption. In the decoding phase, the word-by-word prediction infers the next word only from the previously generated caption. That local text context is insufficient for word prediction. To tackle the above two issues, the representations of video and text stem from the convolution on two graphs. The convolution on the video graph distills the visual features by filtering the irrelevant background and uncorrelated salient objects. The key issue is to figure out the similar videos according to the video semantic feature. The word graph is constructed to help incorporate global neighborhood among words into word representation. That word global neigh-borhood serves as the global text context and compensates the local text context. Results on two benchmark datasets show the advantage of the proposed method. Experimental analysis is also conducted to verify the effectiveness of the proposed modules.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI56018.2022.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Video captioning is to translate the video content into the textual descriptions. In the encoding phase, the existing approaches encode the irrelevant background and uncorrelated visual objects into visual features. That leads to semantic aberration between the visual features and the expected textual caption. In the decoding phase, the word-by-word prediction infers the next word only from the previously generated caption. That local text context is insufficient for word prediction. To tackle the above two issues, the representations of video and text stem from the convolution on two graphs. The convolution on the video graph distills the visual features by filtering the irrelevant background and uncorrelated salient objects. The key issue is to figure out the similar videos according to the video semantic feature. The word graph is constructed to help incorporate global neighborhood among words into word representation. That word global neigh-borhood serves as the global text context and compensates the local text context. Results on two benchmark datasets show the advantage of the proposed method. Experimental analysis is also conducted to verify the effectiveness of the proposed modules.
视频和文本的图形表示与视频字幕的结合
视频字幕是将视频内容翻译成文字描述。在编码阶段,现有的方法将不相关的背景和不相关的视觉对象编码为视觉特征。这将导致视觉特征和预期文本标题之间的语义偏差。在解码阶段,逐字预测仅从先前生成的标题中推断出下一个单词。本地文本上下文不足以进行单词预测。为了解决上述两个问题,视频和文本的表示源于两个图上的卷积。对视频图进行卷积,通过过滤不相关的背景和不相关的突出对象,提取视频图的视觉特征。关键问题是根据视频的语义特征找出相似的视频。构建词图有助于将词之间的全局邻域整合到词表示中。“全球邻域”一词充当了全球文本语境,弥补了局部文本语境。在两个基准数据集上的实验结果表明了该方法的优越性。实验分析验证了所提模块的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信