New Hybrid Deep Learning Method to Recognize Human Action from Video

M. Islam, Sunjida Sultana, Md Jabbarul Islam
{"title":"New Hybrid Deep Learning Method to Recognize Human Action from Video","authors":"M. Islam, Sunjida Sultana, Md Jabbarul Islam","doi":"10.26555/jiteki.v7i2.21499","DOIUrl":null,"url":null,"abstract":"There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different purposes for users. Several machine learning approaches for video classification have been developed to save users time and energy. The use of deep neural networks to recognize human behavior has become a popular issue in recent years. Although significant progress has been made in the field of video recognition, there are still numerous challenges in the realm of video to be overcome. Convolutional neural networks (CNNs) are well-known for requiring a fixed-size image input, which limits the network topology and reduces identification accuracy. Despite the fact that this problem has been solved in the world of photos, it has yet to be solved in the area of video. We present a ten stacked three-dimensional (3D) convolutional network based on the spatial pyramid-based pooling to handle the input problem of fixed size video frames in video recognition. The network structure is made up of three sections, as the name suggests: a ten-layer stacked 3DCNN, DenseNet, and SPPNet. A KTH dataset was used to test our algorithms. The experimental findings showed that our model outperformed existing models in the area of video-based behavior identification by 2% margin accuracy.","PeriodicalId":244902,"journal":{"name":"Jurnal Ilmiah Teknik Elektro Komputer dan Informatika","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Ilmiah Teknik Elektro Komputer dan Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26555/jiteki.v7i2.21499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different purposes for users. Several machine learning approaches for video classification have been developed to save users time and energy. The use of deep neural networks to recognize human behavior has become a popular issue in recent years. Although significant progress has been made in the field of video recognition, there are still numerous challenges in the realm of video to be overcome. Convolutional neural networks (CNNs) are well-known for requiring a fixed-size image input, which limits the network topology and reduces identification accuracy. Despite the fact that this problem has been solved in the world of photos, it has yet to be solved in the area of video. We present a ten stacked three-dimensional (3D) convolutional network based on the spatial pyramid-based pooling to handle the input problem of fixed size video frames in video recognition. The network structure is made up of three sections, as the name suggests: a ten-layer stacked 3DCNN, DenseNet, and SPPNet. A KTH dataset was used to test our algorithms. The experimental findings showed that our model outperformed existing models in the area of video-based behavior identification by 2% margin accuracy.
从视频中识别人类行为的混合深度学习新方法
近年来,互联网用户和带宽都有了巨大的增长。由于互联网连接非常便宜,信息共享(文本、音频和视频)变得更加流行和快速。必须对视频内容进行检查,以便对其进行分类,以满足用户的不同目的。为了节省用户的时间和精力,已经开发了几种用于视频分类的机器学习方法。近年来,利用深度神经网络识别人类行为已经成为一个热门话题。尽管在视频识别领域取得了重大进展,但在视频领域仍有许多挑战需要克服。众所周知,卷积神经网络(cnn)需要固定大小的图像输入,这限制了网络拓扑结构并降低了识别精度。尽管这个问题在照片领域已经解决了,但在视频领域还没有解决。针对视频识别中固定大小视频帧的输入问题,提出了一种基于空间金字塔池化的十层三维卷积网络。网络结构由三部分组成,顾名思义:十层堆叠的3DCNN、DenseNet和SPPNet。一个KTH数据集被用来测试我们的算法。实验结果表明,在基于视频的行为识别领域,我们的模型比现有模型高出2%的边际精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信