New Hybrid Deep Learning Method to Recognize Human Action from Video

Jurnal Ilmiah Teknik Elektro Komputer dan Informatika Pub Date : 2021-09-01 DOI:10.26555/jiteki.v7i2.21499

M. Islam, Sunjida Sultana, Md Jabbarul Islam

{"title":"New Hybrid Deep Learning Method to Recognize Human Action from Video","authors":"M. Islam, Sunjida Sultana, Md Jabbarul Islam","doi":"10.26555/jiteki.v7i2.21499","DOIUrl":null,"url":null,"abstract":"There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different purposes for users. Several machine learning approaches for video classification have been developed to save users time and energy. The use of deep neural networks to recognize human behavior has become a popular issue in recent years. Although significant progress has been made in the field of video recognition, there are still numerous challenges in the realm of video to be overcome. Convolutional neural networks (CNNs) are well-known for requiring a fixed-size image input, which limits the network topology and reduces identification accuracy. Despite the fact that this problem has been solved in the world of photos, it has yet to be solved in the area of video. We present a ten stacked three-dimensional (3D) convolutional network based on the spatial pyramid-based pooling to handle the input problem of fixed size video frames in video recognition. The network structure is made up of three sections, as the name suggests: a ten-layer stacked 3DCNN, DenseNet, and SPPNet. A KTH dataset was used to test our algorithms. The experimental findings showed that our model outperformed existing models in the area of video-based behavior identification by 2% margin accuracy.","PeriodicalId":244902,"journal":{"name":"Jurnal Ilmiah Teknik Elektro Komputer dan Informatika","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Ilmiah Teknik Elektro Komputer dan Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26555/jiteki.v7i2.21499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

There has been a tremendous increase in internet users and enough bandwidth in recent years. Because Internet connectivity is so inexpensive, information sharing (text, audio, and video) has become more popular and faster. This video content must be examined in order to classify it for different purposes for users. Several machine learning approaches for video classification have been developed to save users time and energy. The use of deep neural networks to recognize human behavior has become a popular issue in recent years. Although significant progress has been made in the field of video recognition, there are still numerous challenges in the realm of video to be overcome. Convolutional neural networks (CNNs) are well-known for requiring a fixed-size image input, which limits the network topology and reduces identification accuracy. Despite the fact that this problem has been solved in the world of photos, it has yet to be solved in the area of video. We present a ten stacked three-dimensional (3D) convolutional network based on the spatial pyramid-based pooling to handle the input problem of fixed size video frames in video recognition. The network structure is made up of three sections, as the name suggests: a ten-layer stacked 3DCNN, DenseNet, and SPPNet. A KTH dataset was used to test our algorithms. The experimental findings showed that our model outperformed existing models in the area of video-based behavior identification by 2% margin accuracy.

查看原文本刊更多论文

从视频中识别人类行为的混合深度学习新方法

近年来，互联网用户和带宽都有了巨大的增长。由于互联网连接非常便宜，信息共享(文本、音频和视频)变得更加流行和快速。必须对视频内容进行检查，以便对其进行分类，以满足用户的不同目的。为了节省用户的时间和精力，已经开发了几种用于视频分类的机器学习方法。近年来，利用深度神经网络识别人类行为已经成为一个热门话题。尽管在视频识别领域取得了重大进展，但在视频领域仍有许多挑战需要克服。众所周知，卷积神经网络(cnn)需要固定大小的图像输入，这限制了网络拓扑结构并降低了识别精度。尽管这个问题在照片领域已经解决了，但在视频领域还没有解决。针对视频识别中固定大小视频帧的输入问题，提出了一种基于空间金字塔池化的十层三维卷积网络。网络结构由三部分组成，顾名思义:十层堆叠的3DCNN、DenseNet和SPPNet。一个KTH数据集被用来测试我们的算法。实验结果表明，在基于视频的行为识别领域，我们的模型比现有模型高出2%的边际精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jurnal Ilmiah Teknik Elektro Komputer dan Informatika

自引率

0.00%

发文量