Video Object Linguistic Grounding

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications Pub Date : 2019-10-15 DOI:10.1145/3347450.3357662

Alba Herrera-Palacio, Carles Ventura, Xavier Giró-i-Nieto

引用次数: 1

Abstract

The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance in semi-supervised video object segmentation, to add a linguistic branch that would generate an attention map over the video frames, making the segmentation of the objects temporally consistent along the sequence.

查看原文本刊更多论文

视频对象语言基础

这项工作的目标是在视频序列中分割在场景的语言描述中提到的物体。我们改编了现有的深度神经网络，该网络在半监督视频对象分割中实现了最先进的性能，并添加了一个语言分支，该分支将在视频帧上生成注意力图，从而使对象的分割在时间上沿序列保持一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications

自引率

0.00%

发文量