基于CNN-RNN的时空方法在手势识别中的应用

2021 4th International Conference of Computer and Informatics Engineering (IC2IE) Pub Date : 2021-09-14 DOI:10.1109/ic2ie53219.2021.9649108

Mochammad Rifky Gunawan, E. C. Djamal

{"title":"基于CNN-RNN的时空方法在手势识别中的应用","authors":"Mochammad Rifky Gunawan, E. C. Djamal","doi":"10.1109/ic2ie53219.2021.9649108","DOIUrl":null,"url":null,"abstract":"One of the ways of communication in human- computer interaction is by hand gesturing through video, a collection of sequential images, and has a frame per second (fps) configuration so that the existing image could change at any time. Recognize hand gesture videos through the pattern of each frame and its connection. Therefore the recognition views them as images in time sequences. There are several approaches—the single spatial approach by collecting image sequences in large images. Even though it has good accuracy, it will have problems with a less responsive background and fast movement because it captures less information on image pattern changes from adjacent frames. Others take memory. The temporal approach focuses on comparing image patterns between frames but requires spatial information or patterns for each frame. It is not the only initial frame. Hence, it is appropriate to combine the two approaches simultaneously in motion recognition or movement called Spatio-Temporal. Convolution Neural Network (CNN) is good in image recognition. Recurrent Neural Networks (RNN) are usually suitable for recognizing sequences and their relationships. Therefore, for hand gesture recognition, this research used a Spatio-Temporal approach with the CNN-RNN method. CNN with Spatial-Streams get image patterns, and Temporal- Streams use RNN to get connected patterns. The results showed that the combination of CNN and RNN for the Spatio- Temporal approach could recognize one of the four-hand gestures by 96.43%. The experiments resulted in eight CNN convolution layers and two Dense layers in RNN with GRU and LSTM architectures.","PeriodicalId":178443,"journal":{"name":"2021 4th International Conference of Computer and Informatics Engineering (IC2IE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Spatio-Temporal Approach using CNN-RNN in Hand Gesture Recognition\",\"authors\":\"Mochammad Rifky Gunawan, E. C. Djamal\",\"doi\":\"10.1109/ic2ie53219.2021.9649108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the ways of communication in human- computer interaction is by hand gesturing through video, a collection of sequential images, and has a frame per second (fps) configuration so that the existing image could change at any time. Recognize hand gesture videos through the pattern of each frame and its connection. Therefore the recognition views them as images in time sequences. There are several approaches—the single spatial approach by collecting image sequences in large images. Even though it has good accuracy, it will have problems with a less responsive background and fast movement because it captures less information on image pattern changes from adjacent frames. Others take memory. The temporal approach focuses on comparing image patterns between frames but requires spatial information or patterns for each frame. It is not the only initial frame. Hence, it is appropriate to combine the two approaches simultaneously in motion recognition or movement called Spatio-Temporal. Convolution Neural Network (CNN) is good in image recognition. Recurrent Neural Networks (RNN) are usually suitable for recognizing sequences and their relationships. Therefore, for hand gesture recognition, this research used a Spatio-Temporal approach with the CNN-RNN method. CNN with Spatial-Streams get image patterns, and Temporal- Streams use RNN to get connected patterns. The results showed that the combination of CNN and RNN for the Spatio- Temporal approach could recognize one of the four-hand gestures by 96.43%. The experiments resulted in eight CNN convolution layers and two Dense layers in RNN with GRU and LSTM architectures.\",\"PeriodicalId\":178443,\"journal\":{\"name\":\"2021 4th International Conference of Computer and Informatics Engineering (IC2IE)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Conference of Computer and Informatics Engineering (IC2IE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ic2ie53219.2021.9649108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Conference of Computer and Informatics Engineering (IC2IE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ic2ie53219.2021.9649108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

人机交互中的一种通信方式是通过视频进行手势，视频是一系列连续图像的集合，并且具有每秒帧(fps)的配置，因此现有图像可以随时更改。通过每帧的模式及其连接来识别手势视频。因此，识别将它们视为时间序列的图像。有几种方法-单一空间方法通过在大图像中收集图像序列。尽管它具有良好的准确性，但由于它从相邻帧中捕获的图像模式变化信息较少，因此在响应性较差的背景和快速移动时存在问题。另一些人则选择记忆。时间方法侧重于比较帧之间的图像模式，但需要每帧的空间信息或模式。它不是唯一的初始坐标系。因此，在运动识别或称为时空运动中同时结合这两种方法是合适的。卷积神经网络(CNN)在图像识别方面有很好的应用。递归神经网络(RNN)通常适用于序列及其关系的识别。因此，对于手势识别，本研究采用了CNN-RNN方法的时空方法。CNN使用Spatial-Streams获取图像模式，Temporal- Streams使用RNN获取连接模式。结果表明，结合CNN和RNN的时空方法对四手手势的识别率为96.43%。实验结果表明，采用GRU和LSTM结构的RNN有8个卷积层和2个密集层。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spatio-Temporal Approach using CNN-RNN in Hand Gesture Recognition

One of the ways of communication in human- computer interaction is by hand gesturing through video, a collection of sequential images, and has a frame per second (fps) configuration so that the existing image could change at any time. Recognize hand gesture videos through the pattern of each frame and its connection. Therefore the recognition views them as images in time sequences. There are several approaches—the single spatial approach by collecting image sequences in large images. Even though it has good accuracy, it will have problems with a less responsive background and fast movement because it captures less information on image pattern changes from adjacent frames. Others take memory. The temporal approach focuses on comparing image patterns between frames but requires spatial information or patterns for each frame. It is not the only initial frame. Hence, it is appropriate to combine the two approaches simultaneously in motion recognition or movement called Spatio-Temporal. Convolution Neural Network (CNN) is good in image recognition. Recurrent Neural Networks (RNN) are usually suitable for recognizing sequences and their relationships. Therefore, for hand gesture recognition, this research used a Spatio-Temporal approach with the CNN-RNN method. CNN with Spatial-Streams get image patterns, and Temporal- Streams use RNN to get connected patterns. The results showed that the combination of CNN and RNN for the Spatio- Temporal approach could recognize one of the four-hand gestures by 96.43%. The experiments resulted in eight CNN convolution layers and two Dense layers in RNN with GRU and LSTM architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 4th International Conference of Computer and Informatics Engineering (IC2IE)

自引率

0.00%

发文量