An implicit spatiotemporal shape model for human activity localization and recognition

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops Pub Date : 2009-06-20 DOI:10.1109/CVPRW.2009.5204262

A. Oikonomopoulos, I. Patras, M. Pantic

{"title":"An implicit spatiotemporal shape model for human activity localization and recognition","authors":"A. Oikonomopoulos, I. Patras, M. Pantic","doi":"10.1109/CVPRW.2009.5204262","DOIUrl":null,"url":null,"abstract":"In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, `visual words' and `visual verbs'. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2009.5204262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

Abstract

In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, `visual words' and `visual verbs'. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.

查看原文本刊更多论文

人类活动定位与识别的隐式时空形状模型

在本文中，我们解决了未分割图像序列中人类活动的定位和识别问题。所提出的方法的主要贡献是使用活动的时空形状的隐式表示，它依赖于特征，稀疏，“视觉词”和“视觉动词”的时空定位。在概率时空投票方案中积累了活动时空定位的证据。我们的投票框架的本地特性允许我们恢复发生在同一场景中的多个活动，以及存在混乱和闭塞的活动。我们使用训练集中的描述符构建特定于类的码本，其中我们考虑了码字对的空间共现。码字对相对于对象中心的位置，以及它们在训练集中发生的帧随后被存储，以便创建码字共现的时空模型。在测试阶段，我们使用均值移位模式估计来对每一帧中执行活动的主体进行空间分割，并使用Radon变换来提取关于连续流中活动的时间分割的最可能假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

自引率

0.00%

发文量