Deep Spatio-Temporal Random Fields for Efficient Video Segmentation

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI:10.1109/CVPR.2018.00929

Siddhartha Chandra, C. Couprie, Iasonas Kokkinos

引用次数: 49

Abstract

In this work we introduce a time- and memory-efficient method for structured prediction that couples neuron decisions across both space at time. We show that we are able to perform exact and efficient inference on a densely-connected spatio-temporal graph by capitalizing on recent advances on deep Gaussian Conditional Random Fields (GCRFs). Our method, called VideoGCRF is (a) efficient, (b) has a unique global minimum, and (c) can be trained end-to-end alongside contemporary deep networks for video understanding. We experiment with multiple connectivity patterns in the temporal domain, and present empirical improvements over strong baselines on the tasks of both semantic and instance segmentation of videos. Our implementation is based on the Caffe2 framework and will be available at https://github.com/siddharthachandra/gcrf-v3.0.

查看原文本刊更多论文

基于深度时空随机场的高效视频分割

在这项工作中，我们引入了一种时间和记忆效率高的结构化预测方法，该方法将神经元的决策在时间和空间上结合起来。我们表明，通过利用深度高斯条件随机场(GCRFs)的最新进展，我们能够在密集连接的时空图上执行精确和有效的推理。我们的方法，称为video - crf (a)高效，(b)具有独特的全局最小值，(c)可以与当代深度网络一起进行端到端视频理解训练。我们在时域中对多种连接模式进行了实验，并在视频的语义和实例分割任务上对强基线进行了经验改进。我们的实现基于Caffe2框架，可以在https://github.com/siddharthachandra/gcrf-v3.0上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量