An effective fusion scheme of spatio-temporal features for human action recognition in RGB-D video

2013 International Conference on Control, Automation and Information Sciences (ICCAIS) Pub Date : 2013-11-01 DOI:10.1109/ICCAIS.2013.6720562

Quang D. Tran, N. Ly

引用次数: 7

Abstract

We investigate the problem of human action recognition by studying the effects of fusing feature streams retrieved from color and depth sequences. Our main contribution is two-fold: First, we present the so-called 3DS-HONV descriptor which is a spatio-temporal extension of Histogram of Oriented Normal vector (HONV), specifically designed for capturing the joint shape-motion vision cues from depth sequences; on the other hand, an effective RGB-D features fusion scheme, which exploits information from both color and depth channels, is developed to extract expressive representations for action recognition in real scenarios. As a result, despite its simplicity, our 3DS-HONV descriptor performs surprisingly well, and achieves the state-of-the-art performance on MSRAction3D dataset, which is 88.89% in overall accuracy. Further experiments demonstrate that our latter feature fusion scheme also generalizes well and achieves good results on the one-shot-learning ChaLearn Gesture Data (CGD2011).

查看原文本刊更多论文

一种有效的RGB-D视频人体动作识别的时空特征融合方案

我们通过研究从颜色和深度序列中提取的特征流的融合效果来研究人类动作识别问题。我们的主要贡献有两个方面:首先，我们提出了所谓的3DS-HONV描述符，它是面向法向量直方图(HONV)的时空扩展，专门用于捕获深度序列中的关节形状运动视觉线索;另一方面，提出了一种有效的RGB-D特征融合方案，该方案利用颜色和深度通道的信息提取具有表达性的表示，用于真实场景中的动作识别。因此，尽管它很简单，我们的3d - honv描述符表现得非常好，在MSRAction3D数据集上达到了最先进的性能，总体准确率为88.89%。进一步的实验表明，后一种特征融合方案在单次学习的challearn手势数据(CGD2011)上也具有很好的泛化效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Control, Automation and Information Sciences (ICCAIS)

自引率

0.00%

发文量