Action Recognition Based on the Modified Twostream CNN

International Journal of Mathematical Sciences and Computing Pub Date : 2020-12-08 DOI:10.5815/ijmsc.2020.06.03

Dan Zheng, Hang Li, Shoulin Yin

{"title":"Action Recognition Based on the Modified Twostream CNN","authors":"Dan Zheng, Hang Li, Shoulin Yin","doi":"10.5815/ijmsc.2020.06.03","DOIUrl":null,"url":null,"abstract":"Human action recognition is an important research direction in computer vision areas. Its main content is to simulate human brain to analyze and recognize human action in video. It usually includes individual actions, interactions between people and the external environment. Space-time dual-channel neural network can represent the features of video from both spatial and temporal perspectives. Compared with other neural network models, it has more advantages in human action recognition. In this paper, a action recognition method based on improved space-time two-channel convolutional neural network is proposed. First, the video is divided into several equal length non-overlapping segments, and a frame image representing the static feature of the video and a stacked optical flow image representing the motion feature are sampled at random part from each segment. Then these two kinds of images are input into the spatial domain and the temporal domain convolutional neural network respectively for feature extraction, and then the segmented features of each video are fused in the two channels respectively to obtain the category prediction features of the spatial domain and the temporal domain. Finally, the video action recognition results are obtained by integrating the predictive features of the two channels. Through experiments, various data enhancement methods and transfer learning schemes are discussed to solve the over-fitting problem caused by insufficient training samples, and the effects of different segmental number, pre-training network, segmental feature fusion scheme and dual-channel integration strategy on action recognition performance are analyzed. The experiment results show that the proposed model can better learn the human action features in a complex video and better recognize the action.","PeriodicalId":312036,"journal":{"name":"International Journal of Mathematical Sciences and Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mathematical Sciences and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmsc.2020.06.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Human action recognition is an important research direction in computer vision areas. Its main content is to simulate human brain to analyze and recognize human action in video. It usually includes individual actions, interactions between people and the external environment. Space-time dual-channel neural network can represent the features of video from both spatial and temporal perspectives. Compared with other neural network models, it has more advantages in human action recognition. In this paper, a action recognition method based on improved space-time two-channel convolutional neural network is proposed. First, the video is divided into several equal length non-overlapping segments, and a frame image representing the static feature of the video and a stacked optical flow image representing the motion feature are sampled at random part from each segment. Then these two kinds of images are input into the spatial domain and the temporal domain convolutional neural network respectively for feature extraction, and then the segmented features of each video are fused in the two channels respectively to obtain the category prediction features of the spatial domain and the temporal domain. Finally, the video action recognition results are obtained by integrating the predictive features of the two channels. Through experiments, various data enhancement methods and transfer learning schemes are discussed to solve the over-fitting problem caused by insufficient training samples, and the effects of different segmental number, pre-training network, segmental feature fusion scheme and dual-channel integration strategy on action recognition performance are analyzed. The experiment results show that the proposed model can better learn the human action features in a complex video and better recognize the action.

查看原文本刊更多论文

基于改进双流CNN的动作识别

人体动作识别是计算机视觉领域的一个重要研究方向。其主要内容是模拟人脑来分析和识别视频中的人体动作。它通常包括个人行为、人和外部环境之间的相互作用。时空双通道神经网络可以从空间和时间两个角度表示视频的特征。与其他神经网络模型相比，它在人体动作识别方面具有更大的优势。提出了一种基于改进时空双通道卷积神经网络的动作识别方法。首先，将视频分成几个等长且不重叠的片段，从每个片段中随机抽取代表视频静态特征的帧图像和代表运动特征的堆叠光流图像。然后将这两类图像分别输入到空域和时域卷积神经网络中进行特征提取，然后将每个视频的分割特征分别融合到两个通道中，得到空域和时域的类别预测特征。最后，对两个通道的预测特征进行综合，得到视频动作识别结果。通过实验，讨论了各种数据增强方法和迁移学习方案，以解决训练样本不足导致的过拟合问题，并分析了不同的段数、预训练网络、段特征融合方案和双通道融合策略对动作识别性能的影响。实验结果表明，该模型能较好地学习复杂视频中的人体动作特征，并能较好地识别动作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Mathematical Sciences and Computing

自引率

0.00%

发文量