识别人类行为是姿态估计图的进化

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI:10.1109/CVPR.2018.00127

Mengyuan Liu, Junsong Yuan

{"title":"识别人类行为是姿态估计图的进化","authors":"Mengyuan Liu, Junsong Yuan","doi":"10.1109/CVPR.2018.00127","DOIUrl":null,"url":null,"abstract":"Most video-based action recognition approaches choose to extract features from the whole video to recognize actions. The cluttered background and non-action motions limit the performances of these methods, since they lack the explicit modeling of human body movements. With recent advances of human pose estimation, this work presents a novel method to recognize human action as the evolution of pose estimation maps. Instead of relying on the inaccurate human poses estimated from videos, we observe that pose estimation maps, the byproduct of pose estimation, preserve richer cues of human body to benefit action recognition. Specifically, the evolution of pose estimation maps can be decomposed as an evolution of heatmaps, e.g., probabilistic maps, and an evolution of estimated 2D human poses, which denote the changes of body shape and body pose, respectively. Considering the sparse property of heatmap, we develop spatial rank pooling to aggregate the evolution of heatmaps as a body shape evolution image. As body shape evolution image does not differentiate body parts, we design body guided sampling to aggregate the evolution of poses as a body pose evolution image. The complementary properties between both types of images are explored by deep convolutional neural networks to predict action label. Experiments on NTU RGB+D, UTD-MHAD and PennAction datasets verify the effectiveness of our method, which outperforms most state-of-the-art methods.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"11 1","pages":"1159-1168"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"244","resultStr":"{\"title\":\"Recognizing Human Actions as the Evolution of Pose Estimation Maps\",\"authors\":\"Mengyuan Liu, Junsong Yuan\",\"doi\":\"10.1109/CVPR.2018.00127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most video-based action recognition approaches choose to extract features from the whole video to recognize actions. The cluttered background and non-action motions limit the performances of these methods, since they lack the explicit modeling of human body movements. With recent advances of human pose estimation, this work presents a novel method to recognize human action as the evolution of pose estimation maps. Instead of relying on the inaccurate human poses estimated from videos, we observe that pose estimation maps, the byproduct of pose estimation, preserve richer cues of human body to benefit action recognition. Specifically, the evolution of pose estimation maps can be decomposed as an evolution of heatmaps, e.g., probabilistic maps, and an evolution of estimated 2D human poses, which denote the changes of body shape and body pose, respectively. Considering the sparse property of heatmap, we develop spatial rank pooling to aggregate the evolution of heatmaps as a body shape evolution image. As body shape evolution image does not differentiate body parts, we design body guided sampling to aggregate the evolution of poses as a body pose evolution image. The complementary properties between both types of images are explored by deep convolutional neural networks to predict action label. Experiments on NTU RGB+D, UTD-MHAD and PennAction datasets verify the effectiveness of our method, which outperforms most state-of-the-art methods.\",\"PeriodicalId\":6564,\"journal\":{\"name\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition\",\"volume\":\"11 1\",\"pages\":\"1159-1168\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"244\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2018.00127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2018.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 244

摘要

大多数基于视频的动作识别方法选择从整个视频中提取特征来识别动作。由于这些方法缺乏对人体运动的明确建模，因此混乱的背景和非动作运动限制了这些方法的性能。随着人体姿态估计的最新进展，本文提出了一种新的方法来识别人体动作作为姿态估计地图的演变。我们观察到，姿态估计的副产品姿态估计图保留了更丰富的人体线索，有利于动作识别，而不是依赖于从视频中估计的不准确的人体姿态。具体来说，姿态估计图的演化可以分解为热图(例如概率图)的演化和估计的二维人体姿态的演化，它们分别表示身体形状和身体姿态的变化。考虑到热图的稀疏性，我们开发了空间秩池，将热图的进化聚合为一个体型进化图像。由于身体形态进化图像不区分身体部位，我们设计了身体引导采样，将姿态进化聚合为身体姿态进化图像。利用深度卷积神经网络探索两类图像之间的互补特性来预测动作标签。在NTU RGB+D, UTD-MHAD和PennAction数据集上的实验验证了我们方法的有效性，它优于大多数最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recognizing Human Actions as the Evolution of Pose Estimation Maps

Most video-based action recognition approaches choose to extract features from the whole video to recognize actions. The cluttered background and non-action motions limit the performances of these methods, since they lack the explicit modeling of human body movements. With recent advances of human pose estimation, this work presents a novel method to recognize human action as the evolution of pose estimation maps. Instead of relying on the inaccurate human poses estimated from videos, we observe that pose estimation maps, the byproduct of pose estimation, preserve richer cues of human body to benefit action recognition. Specifically, the evolution of pose estimation maps can be decomposed as an evolution of heatmaps, e.g., probabilistic maps, and an evolution of estimated 2D human poses, which denote the changes of body shape and body pose, respectively. Considering the sparse property of heatmap, we develop spatial rank pooling to aggregate the evolution of heatmaps as a body shape evolution image. As body shape evolution image does not differentiate body parts, we design body guided sampling to aggregate the evolution of poses as a body pose evolution image. The complementary properties between both types of images are explored by deep convolutional neural networks to predict action label. Experiments on NTU RGB+D, UTD-MHAD and PennAction datasets verify the effectiveness of our method, which outperforms most state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量