{"title":"Real-Time Human Action Recognition Using Deep Learning Architecture","authors":"S. Kahlouche, M. Belhocine, Abdallah Menouar","doi":"10.1142/s1469026821500267","DOIUrl":null,"url":null,"abstract":"In this work, efficient human activity recognition (HAR) algorithm based on deep learning architecture is proposed to classify activities into seven different classes. In order to learn spatial and temporal features from only 3D skeleton data captured from a “Microsoft Kinect” camera, the proposed algorithm combines both convolution neural network (CNN) and long short-term memory (LSTM) architectures. This combination allows taking advantage of LSTM in modeling temporal data and of CNN in modeling spatial data. The captured skeleton sequences are used to create a specific dataset of interactive activities; these data are then transformed according to a view invariant and a symmetry criterion. To demonstrate the effectiveness of the developed algorithm, it has been tested on several public datasets and it has achieved and sometimes has overcome state-of-the-art performance. In order to verify the uncertainty of the proposed algorithm, some tools are provided and discussed to ensure its efficiency for continuous human action recognition in real time.","PeriodicalId":422521,"journal":{"name":"Int. J. Comput. Intell. Appl.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Intell. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s1469026821500267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this work, efficient human activity recognition (HAR) algorithm based on deep learning architecture is proposed to classify activities into seven different classes. In order to learn spatial and temporal features from only 3D skeleton data captured from a “Microsoft Kinect” camera, the proposed algorithm combines both convolution neural network (CNN) and long short-term memory (LSTM) architectures. This combination allows taking advantage of LSTM in modeling temporal data and of CNN in modeling spatial data. The captured skeleton sequences are used to create a specific dataset of interactive activities; these data are then transformed according to a view invariant and a symmetry criterion. To demonstrate the effectiveness of the developed algorithm, it has been tested on several public datasets and it has achieved and sometimes has overcome state-of-the-art performance. In order to verify the uncertainty of the proposed algorithm, some tools are provided and discussed to ensure its efficiency for continuous human action recognition in real time.