Temporal Residual Networks for Dynamic Scene Recognition

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-22 DOI:10.1109/CVPR.2017.786

Christoph Feichtenhofer, A. Pinz, Richard P. Wildes

{"title":"Temporal Residual Networks for Dynamic Scene Recognition","authors":"Christoph Feichtenhofer, A. Pinz, Richard P. Wildes","doi":"10.1109/CVPR.2017.786","DOIUrl":null,"url":null,"abstract":"This paper combines three contributions to establish a new state-of-the-art in dynamic scene recognition. First, we present a novel ConvNet architecture based on temporal residual units that is fully convolutional in spacetime. Our model augments spatial ResNets with convolutions across time to hierarchically add temporal residuals as the depth of the network increases. Second, existing approaches to video-based recognition are categorized and a baseline of seven previously top performing algorithms is selected for comparative evaluation on dynamic scenes. Third, we introduce a new and challenging video database of dynamic scenes that more than doubles the size of those previously available. This dataset is explicitly split into two subsets of equal size that contain videos with and without camera motion to allow for systematic study of how this variable interacts with the defining dynamics of the scene per se. Our evaluations verify the particular strengths and weaknesses of the baseline algorithms with respect to various scene classes and camera motion parameters. Finally, our temporal ResNet boosts recognition performance and establishes a new state-of-the-art on dynamic scene recognition, as well as on the complementary task of action recognition.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"7435-7444"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2017.786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 75

Abstract

This paper combines three contributions to establish a new state-of-the-art in dynamic scene recognition. First, we present a novel ConvNet architecture based on temporal residual units that is fully convolutional in spacetime. Our model augments spatial ResNets with convolutions across time to hierarchically add temporal residuals as the depth of the network increases. Second, existing approaches to video-based recognition are categorized and a baseline of seven previously top performing algorithms is selected for comparative evaluation on dynamic scenes. Third, we introduce a new and challenging video database of dynamic scenes that more than doubles the size of those previously available. This dataset is explicitly split into two subsets of equal size that contain videos with and without camera motion to allow for systematic study of how this variable interacts with the defining dynamics of the scene per se. Our evaluations verify the particular strengths and weaknesses of the baseline algorithms with respect to various scene classes and camera motion parameters. Finally, our temporal ResNet boosts recognition performance and establishes a new state-of-the-art on dynamic scene recognition, as well as on the complementary task of action recognition.

查看原文本刊更多论文

动态场景识别的时间残差网络

本文结合三个方面的贡献，建立了动态场景识别的新技术。首先，我们提出了一种新的基于时间残差单元的卷积神经网络结构，该结构在时空中是完全卷积的。我们的模型通过时间上的卷积来增加空间ResNets，随着网络深度的增加，分层地添加时间残差。其次，对现有的基于视频的识别方法进行分类，并选择七个先前表现最好的算法作为基线，对动态场景进行比较评估。第三，我们引入了一个新的、具有挑战性的动态场景视频数据库，它的大小是以前可用的视频数据库的两倍以上。这个数据集被明确地分成两个大小相等的子集，其中包含有和没有摄像机运动的视频，以便系统地研究这个变量如何与场景本身的定义动态相互作用。我们的评估验证了相对于各种场景类和相机运动参数的基线算法的特定优点和缺点。最后，我们的时间ResNet提高了识别性能，并在动态场景识别以及动作识别的互补任务上建立了新的技术水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量