{"title":"深度序列中人体动作识别关节形状运动线索的稀疏时空表示","authors":"Quang D. Tran, N. Ly","doi":"10.1109/RIVF.2013.6719903","DOIUrl":null,"url":null,"abstract":"The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.","PeriodicalId":121216,"journal":{"name":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences\",\"authors\":\"Quang D. Tran, N. Ly\",\"doi\":\"10.1109/RIVF.2013.6719903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.\",\"PeriodicalId\":121216,\"journal\":{\"name\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF.2013.6719903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2013.6719903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences
The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.