深度序列中人体动作识别关节形状运动线索的稀疏时空表示

The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF) Pub Date : 2013-11-01 DOI:10.1109/RIVF.2013.6719903

Quang D. Tran, N. Ly

{"title":"深度序列中人体动作识别关节形状运动线索的稀疏时空表示","authors":"Quang D. Tran, N. Ly","doi":"10.1109/RIVF.2013.6719903","DOIUrl":null,"url":null,"abstract":"The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.","PeriodicalId":121216,"journal":{"name":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences\",\"authors\":\"Quang D. Tran, N. Ly\",\"doi\":\"10.1109/RIVF.2013.6719903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.\",\"PeriodicalId\":121216,\"journal\":{\"name\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF.2013.6719903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2013.6719903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

最近，3D传感器的可用性使得实时捕获深度图成为可能，这简化了各种视觉识别任务，包括对象/动作分类，3D重建等。我们在这里解决深度序列中人类动作识别的问题。一方面，我们提出了一种新的关节形状运动描述符，我们称之为三维球面定向法向量直方图(3dds -HONV)，因为它是在三维球面坐标中量化的原始HONV的时空扩展。我们进一步证明了深度序列中的光流场可以与所提出的描述符结合使用，以增强捕捉平面内运动的能力;后来的实验表明，这种组合比单独的3d - honv更有效。此外，通过稀疏编码的判别字典学习和特征表示应用于所提出的描述符，以减轻噪声的内在影响并捕获高级模式。通过学习这些稀疏和独特的表示，我们在两个具有挑战性的基准测试中展示了对最先进的技术的巨大改进，其结果是MSRAction3D和MSRGesture3D数据集的总体精度分别为91.92%和93.31%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences

The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)

自引率

0.00%

发文量