An effective fusion scheme of spatio-temporal features for human action recognition in RGB-D video

Quang D. Tran, N. Ly
{"title":"An effective fusion scheme of spatio-temporal features for human action recognition in RGB-D video","authors":"Quang D. Tran, N. Ly","doi":"10.1109/ICCAIS.2013.6720562","DOIUrl":null,"url":null,"abstract":"We investigate the problem of human action recognition by studying the effects of fusing feature streams retrieved from color and depth sequences. Our main contribution is two-fold: First, we present the so-called 3DS-HONV descriptor which is a spatio-temporal extension of Histogram of Oriented Normal vector (HONV), specifically designed for capturing the joint shape-motion vision cues from depth sequences; on the other hand, an effective RGB-D features fusion scheme, which exploits information from both color and depth channels, is developed to extract expressive representations for action recognition in real scenarios. As a result, despite its simplicity, our 3DS-HONV descriptor performs surprisingly well, and achieves the state-of-the-art performance on MSRAction3D dataset, which is 88.89% in overall accuracy. Further experiments demonstrate that our latter feature fusion scheme also generalizes well and achieves good results on the one-shot-learning ChaLearn Gesture Data (CGD2011).","PeriodicalId":347974,"journal":{"name":"2013 International Conference on Control, Automation and Information Sciences (ICCAIS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Control, Automation and Information Sciences (ICCAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAIS.2013.6720562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

We investigate the problem of human action recognition by studying the effects of fusing feature streams retrieved from color and depth sequences. Our main contribution is two-fold: First, we present the so-called 3DS-HONV descriptor which is a spatio-temporal extension of Histogram of Oriented Normal vector (HONV), specifically designed for capturing the joint shape-motion vision cues from depth sequences; on the other hand, an effective RGB-D features fusion scheme, which exploits information from both color and depth channels, is developed to extract expressive representations for action recognition in real scenarios. As a result, despite its simplicity, our 3DS-HONV descriptor performs surprisingly well, and achieves the state-of-the-art performance on MSRAction3D dataset, which is 88.89% in overall accuracy. Further experiments demonstrate that our latter feature fusion scheme also generalizes well and achieves good results on the one-shot-learning ChaLearn Gesture Data (CGD2011).
一种有效的RGB-D视频人体动作识别的时空特征融合方案
我们通过研究从颜色和深度序列中提取的特征流的融合效果来研究人类动作识别问题。我们的主要贡献有两个方面:首先,我们提出了所谓的3DS-HONV描述符,它是面向法向量直方图(HONV)的时空扩展,专门用于捕获深度序列中的关节形状运动视觉线索;另一方面,提出了一种有效的RGB-D特征融合方案,该方案利用颜色和深度通道的信息提取具有表达性的表示,用于真实场景中的动作识别。因此,尽管它很简单,我们的3d - honv描述符表现得非常好,在MSRAction3D数据集上达到了最先进的性能,总体准确率为88.89%。进一步的实验表明,后一种特征融合方案在单次学习的challearn手势数据(CGD2011)上也具有很好的泛化效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信