Hidden States Exploration for 3D Skeleton-Based Gesture Recognition

2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI:10.1109/WACV.2019.00201

Xin Liu, Henglin Shi, Xiaopeng Hong, Haoyu Chen, D. Tao, Guoying Zhao

{"title":"Hidden States Exploration for 3D Skeleton-Based Gesture Recognition","authors":"Xin Liu, Henglin Shi, Xiaopeng Hong, Haoyu Chen, D. Tao, Guoying Zhao","doi":"10.1109/WACV.2019.00201","DOIUrl":null,"url":null,"abstract":"3D skeletal data has recently attracted wide attention in human behavior analysis for its robustness to variant scenes, while accurate gesture recognition is still challenging. The main reason lies in the high intra-class variance caused by temporal dynamics. A solution is resorting to the generative models, such as the hidden Markov model (HMM). However, existing methods commonly assume fixed anchors for each hidden state, which is hard to depict the explicit temporal structure of gestures. Based on the observation that a gesture is a time series with distinctly defined phases, we propose a new formulation to build temporal compositions of gestures by the low-rank matrix decomposition. The only assumption is that the gesture's \"hold\" phases with static poses are linearly correlated among each other. As such, a gesture sequence could be segmented into temporal states with semantically meaningful and discriminative concepts. Furthermore, different to traditional HMMs which tend to use specific distance metric for clustering and ignore the temporal contextual information when estimating the emission probability, the Long Short-Term Memory (LSTM) is utilized to learn probability distributions over states of HMM. The proposed method is validated on two challenging datasets. Experiments demonstrate that our approach can effectively work on a wide range of gestures and actions, and achieve state-of-the-art performance.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"287 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2019.00201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

3D skeletal data has recently attracted wide attention in human behavior analysis for its robustness to variant scenes, while accurate gesture recognition is still challenging. The main reason lies in the high intra-class variance caused by temporal dynamics. A solution is resorting to the generative models, such as the hidden Markov model (HMM). However, existing methods commonly assume fixed anchors for each hidden state, which is hard to depict the explicit temporal structure of gestures. Based on the observation that a gesture is a time series with distinctly defined phases, we propose a new formulation to build temporal compositions of gestures by the low-rank matrix decomposition. The only assumption is that the gesture's "hold" phases with static poses are linearly correlated among each other. As such, a gesture sequence could be segmented into temporal states with semantically meaningful and discriminative concepts. Furthermore, different to traditional HMMs which tend to use specific distance metric for clustering and ignore the temporal contextual information when estimating the emission probability, the Long Short-Term Memory (LSTM) is utilized to learn probability distributions over states of HMM. The proposed method is validated on two challenging datasets. Experiments demonstrate that our approach can effectively work on a wide range of gestures and actions, and achieve state-of-the-art performance.

查看原文本刊更多论文

基于3D骨骼的手势识别的隐藏状态探索

近年来，三维骨骼数据因其对不同场景的鲁棒性而在人类行为分析中受到广泛关注，但准确的手势识别仍然是一个挑战。究其原因，主要是由于时间动态导致的类内方差较大。一个解决方案是诉诸于生成模型，如隐马尔可夫模型(HMM)。然而，现有的方法通常为每个隐藏状态假设固定的锚点，这很难描述手势的明确时间结构。基于观察到手势是一个具有明确相位的时间序列，我们提出了一种通过低秩矩阵分解来构建手势时间组成的新公式。唯一的假设是，手势的“保持”阶段与静态姿势之间是线性相关的。因此，手势序列可以被分割成具有语义意义和判别概念的时间状态。此外，与传统HMM在估计发射概率时倾向于使用特定距离度量进行聚类而忽略时间上下文信息不同，该方法利用长短期记忆(LSTM)来学习HMM在状态上的概率分布。在两个具有挑战性的数据集上验证了该方法。实验表明，我们的方法可以有效地处理各种手势和动作，并达到最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量