Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jiaying Liu
{"title":"Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition","authors":"Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, Jiaying Liu","doi":"10.1109/ICME.2018.8486486","DOIUrl":null,"url":null,"abstract":"This paper presents a new framework for action recognition with multi-modal data. A skeleton-indexed feature learning procedure is developed to further exploit the detailed local features from RGB and optical flow videos. In particular, the proposed framework is built based on a deep Convolutional Network (ConvNet) and a Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM). A skeleton-indexed transform layer is designed to automatically extract visual features around key joints, and a part-aggregated pooling is developed to uniformly regulate the visual features from different body parts and actors. Besides, several fusion schemes are explored to take advantage of multi-modal data. The proposed deep architecture is end-to-end trainable and can better incorporate different modalities to learn effective feature representations. Quantitative experiment results on two datasets, the NTU RGB+D dataset and the MSR dataset, demonstrate the excellent performance of our scheme over other state-of-the-arts. To our knowledge, the performance obtained by the proposed framework is currently the best on the challenging NTU RGB+D dataset.","PeriodicalId":426613,"journal":{"name":"2018 IEEE International Conference on Multimedia and Expo (ICME)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2018.8486486","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

This paper presents a new framework for action recognition with multi-modal data. A skeleton-indexed feature learning procedure is developed to further exploit the detailed local features from RGB and optical flow videos. In particular, the proposed framework is built based on a deep Convolutional Network (ConvNet) and a Recurrent Neural Network (RNN) with Long Short Term Memory (LSTM). A skeleton-indexed transform layer is designed to automatically extract visual features around key joints, and a part-aggregated pooling is developed to uniformly regulate the visual features from different body parts and actors. Besides, several fusion schemes are explored to take advantage of multi-modal data. The proposed deep architecture is end-to-end trainable and can better incorporate different modalities to learn effective feature representations. Quantitative experiment results on two datasets, the NTU RGB+D dataset and the MSR dataset, demonstrate the excellent performance of our scheme over other state-of-the-arts. To our knowledge, the performance obtained by the proposed framework is currently the best on the challenging NTU RGB+D dataset.
用于高性能人体动作识别的骨骼索引深度多模态特征学习
本文提出了一种新的多模态数据动作识别框架。开发了一个骨架索引特征学习程序,以进一步利用RGB和光流视频中的详细局部特征。特别是,该框架是基于深度卷积网络(ConvNet)和具有长短期记忆(LSTM)的递归神经网络(RNN)构建的。设计了以骨架为索引的变换层,自动提取关键关节周围的视觉特征;开发了部位聚合池,统一调节不同身体部位和演员的视觉特征。此外,还探讨了多种融合方案,以充分利用多模态数据。提出的深度架构是端到端可训练的,可以更好地结合不同的模式来学习有效的特征表示。在NTU RGB+D数据集和MSR数据集上的定量实验结果表明,该方案优于其他最先进的方法。据我们所知,所提出的框架在具有挑战性的NTU RGB+D数据集上获得的性能是目前最好的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信