Action Recognition Based on the Modified Twostream CNN

Dan Zheng, Hang Li, Shoulin Yin
{"title":"Action Recognition Based on the Modified Twostream CNN","authors":"Dan Zheng, Hang Li, Shoulin Yin","doi":"10.5815/ijmsc.2020.06.03","DOIUrl":null,"url":null,"abstract":"Human action recognition is an important research direction in computer vision areas. Its main content is to simulate human brain to analyze and recognize human action in video. It usually includes individual actions, interactions between people and the external environment. Space-time dual-channel neural network can represent the features of video from both spatial and temporal perspectives. Compared with other neural network models, it has more advantages in human action recognition. In this paper, a action recognition method based on improved space-time two-channel convolutional neural network is proposed. First, the video is divided into several equal length non-overlapping segments, and a frame image representing the static feature of the video and a stacked optical flow image representing the motion feature are sampled at random part from each segment. Then these two kinds of images are input into the spatial domain and the temporal domain convolutional neural network respectively for feature extraction, and then the segmented features of each video are fused in the two channels respectively to obtain the category prediction features of the spatial domain and the temporal domain. Finally, the video action recognition results are obtained by integrating the predictive features of the two channels. Through experiments, various data enhancement methods and transfer learning schemes are discussed to solve the over-fitting problem caused by insufficient training samples, and the effects of different segmental number, pre-training network, segmental feature fusion scheme and dual-channel integration strategy on action recognition performance are analyzed. The experiment results show that the proposed model can better learn the human action features in a complex video and better recognize the action.","PeriodicalId":312036,"journal":{"name":"International Journal of Mathematical Sciences and Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mathematical Sciences and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmsc.2020.06.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Human action recognition is an important research direction in computer vision areas. Its main content is to simulate human brain to analyze and recognize human action in video. It usually includes individual actions, interactions between people and the external environment. Space-time dual-channel neural network can represent the features of video from both spatial and temporal perspectives. Compared with other neural network models, it has more advantages in human action recognition. In this paper, a action recognition method based on improved space-time two-channel convolutional neural network is proposed. First, the video is divided into several equal length non-overlapping segments, and a frame image representing the static feature of the video and a stacked optical flow image representing the motion feature are sampled at random part from each segment. Then these two kinds of images are input into the spatial domain and the temporal domain convolutional neural network respectively for feature extraction, and then the segmented features of each video are fused in the two channels respectively to obtain the category prediction features of the spatial domain and the temporal domain. Finally, the video action recognition results are obtained by integrating the predictive features of the two channels. Through experiments, various data enhancement methods and transfer learning schemes are discussed to solve the over-fitting problem caused by insufficient training samples, and the effects of different segmental number, pre-training network, segmental feature fusion scheme and dual-channel integration strategy on action recognition performance are analyzed. The experiment results show that the proposed model can better learn the human action features in a complex video and better recognize the action.
基于改进双流CNN的动作识别
人体动作识别是计算机视觉领域的一个重要研究方向。其主要内容是模拟人脑来分析和识别视频中的人体动作。它通常包括个人行为、人和外部环境之间的相互作用。时空双通道神经网络可以从空间和时间两个角度表示视频的特征。与其他神经网络模型相比,它在人体动作识别方面具有更大的优势。提出了一种基于改进时空双通道卷积神经网络的动作识别方法。首先,将视频分成几个等长且不重叠的片段,从每个片段中随机抽取代表视频静态特征的帧图像和代表运动特征的堆叠光流图像。然后将这两类图像分别输入到空域和时域卷积神经网络中进行特征提取,然后将每个视频的分割特征分别融合到两个通道中,得到空域和时域的类别预测特征。最后,对两个通道的预测特征进行综合,得到视频动作识别结果。通过实验,讨论了各种数据增强方法和迁移学习方案,以解决训练样本不足导致的过拟合问题,并分析了不同的段数、预训练网络、段特征融合方案和双通道融合策略对动作识别性能的影响。实验结果表明,该模型能较好地学习复杂视频中的人体动作特征,并能较好地识别动作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信