Representing Videos Using Mid-level Discriminative Patches

2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI:10.1109/CVPR.2013.332

Arpit Jain, A. Gupta, Mikel D. Rodriguez, L. Davis

引用次数: 155

Abstract

How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatio-temporal patch in the video. What defines these spatio-temporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate state-of-the-art performance on UCF50 and Olympics datasets.

查看原文本刊更多论文

使用中级判别补丁表示视频

视频应该如何表现?本文提出了一种基于中级判别性时空补丁的视频表示方法。这些时空片段可能对应于一个原始的人类行为，一个语义对象，或者可能是视频中一个随机但信息丰富的时空片段。定义这些时空斑块的是它们的区别性和代表性。我们从数百个训练视频中自动挖掘这些补丁，并通过实验证明这些补丁在视频之间建立了对应关系，并为标签转移技术对齐视频。此外，这些补丁可以用作动作分类的判别词汇表，它们在UCF50和奥运会数据集上展示了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量