视频对象语言基础

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications Pub Date : 2019-10-15 DOI:10.1145/3347450.3357662

Alba Herrera-Palacio, Carles Ventura, Xavier Giró-i-Nieto

{"title":"视频对象语言基础","authors":"Alba Herrera-Palacio, Carles Ventura, Xavier Giró-i-Nieto","doi":"10.1145/3347450.3357662","DOIUrl":null,"url":null,"abstract":"The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance in semi-supervised video object segmentation, to add a linguistic branch that would generate an attention map over the video frames, making the segmentation of the objects temporally consistent along the sequence.","PeriodicalId":329495,"journal":{"name":"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Video Object Linguistic Grounding\",\"authors\":\"Alba Herrera-Palacio, Carles Ventura, Xavier Giró-i-Nieto\",\"doi\":\"10.1145/3347450.3357662\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance in semi-supervised video object segmentation, to add a linguistic branch that would generate an attention map over the video frames, making the segmentation of the objects temporally consistent along the sequence.\",\"PeriodicalId\":329495,\"journal\":{\"name\":\"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3347450.3357662\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"1st International Workshop on Multimodal Understanding and Learning for Embodied Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3347450.3357662","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

这项工作的目标是在视频序列中分割在场景的语言描述中提到的物体。我们改编了现有的深度神经网络，该网络在半监督视频对象分割中实现了最先进的性能，并添加了一个语言分支，该分支将在视频帧上生成注意力图，从而使对象的分割在时间上沿序列保持一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Video Object Linguistic Grounding

The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance in semi-supervised video object segmentation, to add a linguistic branch that would generate an attention map over the video frames, making the segmentation of the objects temporally consistent along the sequence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

1st International Workshop on Multimodal Understanding and Learning for Embodied Applications

自引率

0.00%

发文量