基于前向-后向网络的视频预测:更深入的时空一致性研究

Yuke Li
{"title":"基于前向-后向网络的视频预测:更深入的时空一致性研究","authors":"Yuke Li","doi":"10.1145/3240508.3240551","DOIUrl":null,"url":null,"abstract":"Video forecasting is an emerging topic in the computer vision field, and it is a pivotal step toward unsupervised video understanding. However, the predictions generated from the state-of-the-art methods might be far from ideal quality, due to a lack of guidance from the labeled data of correct predictions (e.g., the annotated future pose of a person). Hence, building a network for better predicting future sequences in an unsupervised manner has to be further pursued. To this end, we put forth a novel Forward-Backward-Net (FB-Net) architecture, which delves deeper into spatiotemporal consistency. It first derives the forward consistency from the raw historical observations. In contrast to mainstream video forecasting approaches, FB-Net then investigates the backward consistency from the future to the past to reinforce the predictions. The final predicted results are inferred by jointly taking both the forward and backward consistencies into account. Moreover, we embed the motion dynamics and the visual content into a single framework via the FB-Net architecture, which significantly differs from learning each component throughout the videos separately. We evaluate our FB-Net on the large-scale KTH and UCF101 datasets. The experiments show that it can introduce considerable margin improvements with respect to most recent leading studies.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Video Forecasting with Forward-Backward-Net: Delving Deeper into Spatiotemporal Consistency\",\"authors\":\"Yuke Li\",\"doi\":\"10.1145/3240508.3240551\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video forecasting is an emerging topic in the computer vision field, and it is a pivotal step toward unsupervised video understanding. However, the predictions generated from the state-of-the-art methods might be far from ideal quality, due to a lack of guidance from the labeled data of correct predictions (e.g., the annotated future pose of a person). Hence, building a network for better predicting future sequences in an unsupervised manner has to be further pursued. To this end, we put forth a novel Forward-Backward-Net (FB-Net) architecture, which delves deeper into spatiotemporal consistency. It first derives the forward consistency from the raw historical observations. In contrast to mainstream video forecasting approaches, FB-Net then investigates the backward consistency from the future to the past to reinforce the predictions. The final predicted results are inferred by jointly taking both the forward and backward consistencies into account. Moreover, we embed the motion dynamics and the visual content into a single framework via the FB-Net architecture, which significantly differs from learning each component throughout the videos separately. We evaluate our FB-Net on the large-scale KTH and UCF101 datasets. The experiments show that it can introduce considerable margin improvements with respect to most recent leading studies.\",\"PeriodicalId\":339857,\"journal\":{\"name\":\"Proceedings of the 26th ACM international conference on Multimedia\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3240508.3240551\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3240508.3240551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

视频预测是计算机视觉领域的一个新兴课题,是实现无监督视频理解的关键一步。然而,由于缺乏正确预测的标记数据(例如,注释的人的未来姿势)的指导,从最先进的方法生成的预测可能远非理想的质量。因此,建立一个以无监督的方式更好地预测未来序列的网络必须进一步追求。为此,我们提出了一种新的向前-向后网络(FB-Net)架构,该架构更深入地研究了时空一致性。它首先从原始的历史观察中推导出向前的一致性。与主流视频预测方法相比,FB-Net随后研究了从未来到过去的向后一致性,以加强预测。最终的预测结果是综合考虑前向和后向一致性来推断的。此外,我们通过FB-Net架构将运动动力学和视觉内容嵌入到单个框架中,这与在整个视频中单独学习每个组件有很大不同。我们在大规模KTH和UCF101数据集上对FB-Net进行了评估。实验表明,相对于最近的领先研究,它可以引入相当大的边际改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Video Forecasting with Forward-Backward-Net: Delving Deeper into Spatiotemporal Consistency
Video forecasting is an emerging topic in the computer vision field, and it is a pivotal step toward unsupervised video understanding. However, the predictions generated from the state-of-the-art methods might be far from ideal quality, due to a lack of guidance from the labeled data of correct predictions (e.g., the annotated future pose of a person). Hence, building a network for better predicting future sequences in an unsupervised manner has to be further pursued. To this end, we put forth a novel Forward-Backward-Net (FB-Net) architecture, which delves deeper into spatiotemporal consistency. It first derives the forward consistency from the raw historical observations. In contrast to mainstream video forecasting approaches, FB-Net then investigates the backward consistency from the future to the past to reinforce the predictions. The final predicted results are inferred by jointly taking both the forward and backward consistencies into account. Moreover, we embed the motion dynamics and the visual content into a single framework via the FB-Net architecture, which significantly differs from learning each component throughout the videos separately. We evaluate our FB-Net on the large-scale KTH and UCF101 datasets. The experiments show that it can introduce considerable margin improvements with respect to most recent leading studies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信