基于改进融合注意力CNN和RNN的人体动作识别

Han Zhao, Xinyu Jin
{"title":"基于改进融合注意力CNN和RNN的人体动作识别","authors":"Han Zhao, Xinyu Jin","doi":"10.1109/ICCIA49625.2020.00028","DOIUrl":null,"url":null,"abstract":"The attention mechanism based models for computer vision and natural language processing are widely utilized, and action recognition in videos is no exception. In this paper, we develop a novel convolutional and recurrent network for action recognition which is \"doubly deep\" in spatial and temporal layers. First, in the feature extraction stage, we propose an improved p-non-local operations as a simple and effective component to capture long-distance dependencies with deep convolutional neural networks. Second, in the class prediction stage, we propose Fusion KeyLess Attention combining with the forward and backward bidirectional LSTM to learn the sequential nature of the data more efficiently and elegantly, which uses multi-epoch models fusion based on confusion matrix. Experiments on two heterogeneous datasets, HMDB51 and Hollywood2 show that our model has distinct advantages over traditional models also only utilizing RGB features for action recognition based on CNN and RNN.","PeriodicalId":237536,"journal":{"name":"2020 5th International Conference on Computational Intelligence and Applications (ICCIA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Human Action Recognition Based on Improved Fusion Attention CNN and RNN\",\"authors\":\"Han Zhao, Xinyu Jin\",\"doi\":\"10.1109/ICCIA49625.2020.00028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The attention mechanism based models for computer vision and natural language processing are widely utilized, and action recognition in videos is no exception. In this paper, we develop a novel convolutional and recurrent network for action recognition which is \\\"doubly deep\\\" in spatial and temporal layers. First, in the feature extraction stage, we propose an improved p-non-local operations as a simple and effective component to capture long-distance dependencies with deep convolutional neural networks. Second, in the class prediction stage, we propose Fusion KeyLess Attention combining with the forward and backward bidirectional LSTM to learn the sequential nature of the data more efficiently and elegantly, which uses multi-epoch models fusion based on confusion matrix. Experiments on two heterogeneous datasets, HMDB51 and Hollywood2 show that our model has distinct advantages over traditional models also only utilizing RGB features for action recognition based on CNN and RNN.\",\"PeriodicalId\":237536,\"journal\":{\"name\":\"2020 5th International Conference on Computational Intelligence and Applications (ICCIA)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Computational Intelligence and Applications (ICCIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIA49625.2020.00028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Computational Intelligence and Applications (ICCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIA49625.2020.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

基于注意机制的计算机视觉和自然语言处理模型得到了广泛的应用,视频中的动作识别也不例外。在本文中,我们开发了一种新颖的卷积和循环网络用于动作识别,它在空间和时间层上是“双深度”的。首先,在特征提取阶段,我们提出了一种改进的p-非局部操作,作为一种简单有效的组件,利用深度卷积神经网络捕获远程依赖关系。其次,在类预测阶段,采用基于混淆矩阵的多历元模型融合,提出了融合无键注意与前向和后向双向LSTM相结合的方法,更高效、更优雅地学习数据的序列性。在HMDB51和holwood2两个异构数据集上的实验表明,我们的模型相对于传统的仅利用RGB特征进行基于CNN和RNN的动作识别具有明显的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Human Action Recognition Based on Improved Fusion Attention CNN and RNN
The attention mechanism based models for computer vision and natural language processing are widely utilized, and action recognition in videos is no exception. In this paper, we develop a novel convolutional and recurrent network for action recognition which is "doubly deep" in spatial and temporal layers. First, in the feature extraction stage, we propose an improved p-non-local operations as a simple and effective component to capture long-distance dependencies with deep convolutional neural networks. Second, in the class prediction stage, we propose Fusion KeyLess Attention combining with the forward and backward bidirectional LSTM to learn the sequential nature of the data more efficiently and elegantly, which uses multi-epoch models fusion based on confusion matrix. Experiments on two heterogeneous datasets, HMDB51 and Hollywood2 show that our model has distinct advantages over traditional models also only utilizing RGB features for action recognition based on CNN and RNN.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信