基于RGB和光流数据的动态手势识别的联结主义时间分类模型

The International Arab Journal of Information Technology Pub Date : 2020-07-01 DOI:10.34028/iajit/17/4/8

S. Patel, R. Makwana

{"title":"基于RGB和光流数据的动态手势识别的联结主义时间分类模型","authors":"S. Patel, R. Makwana","doi":"10.34028/iajit/17/4/8","DOIUrl":null,"url":null,"abstract":"Automatic classification of dynamic hand gesture is challenging due to the large diversity in a different class of gesture, Low resolution, and it is performed by finger. Due to a number of challenges many researchers focus on this area. Recently deep neural network can be used for implicit feature extraction and Soft Max layer is used for classification. In this paper, we propose a method based on a two-dimensional convolutional neural network that performs detection and classification of hand gesture simultaneously from multimodal Red, Green, Blue, Depth (RGBD) and Optical flow Data and passes this feature to Long-Short Term Memory (LSTM) recurrent network for frame-to-frame probability generation with Connectionist Temporal Classification (CTC) network for loss calculation. We have calculated an optical flow from Red, Green, Blue (RGB) data for getting proper motion information present in the video. CTC model is used to efficiently evaluate all possible alignment of hand gesture via dynamic programming and check consistency via frame-to-frame for the visual similarity of hand gesture in the unsegmented input stream. CTC network finds the most probable sequence of a frame for a class of gesture. The frame with the highest probability value is selected from the CTC network by max decoding. This entire CTC network is trained end-to-end with calculating CTC loss for recognition of the gesture. We have used challenging Vision for Intelligent Vehicles and Applications (VIVA) dataset for dynamic hand gesture recognition captured with RGB and Depth data. On this VIVA dataset, our proposed hand gesture recognition technique outperforms competing state-of-the-art algorithms and gets an accuracy of 86%.","PeriodicalId":161392,"journal":{"name":"The International Arab Journal of Information Technology","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Connectionist Temporal Classification Model for Dynamic Hand Gesture Recognition using RGB and Optical flow Data\",\"authors\":\"S. Patel, R. Makwana\",\"doi\":\"10.34028/iajit/17/4/8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic classification of dynamic hand gesture is challenging due to the large diversity in a different class of gesture, Low resolution, and it is performed by finger. Due to a number of challenges many researchers focus on this area. Recently deep neural network can be used for implicit feature extraction and Soft Max layer is used for classification. In this paper, we propose a method based on a two-dimensional convolutional neural network that performs detection and classification of hand gesture simultaneously from multimodal Red, Green, Blue, Depth (RGBD) and Optical flow Data and passes this feature to Long-Short Term Memory (LSTM) recurrent network for frame-to-frame probability generation with Connectionist Temporal Classification (CTC) network for loss calculation. We have calculated an optical flow from Red, Green, Blue (RGB) data for getting proper motion information present in the video. CTC model is used to efficiently evaluate all possible alignment of hand gesture via dynamic programming and check consistency via frame-to-frame for the visual similarity of hand gesture in the unsegmented input stream. CTC network finds the most probable sequence of a frame for a class of gesture. The frame with the highest probability value is selected from the CTC network by max decoding. This entire CTC network is trained end-to-end with calculating CTC loss for recognition of the gesture. We have used challenging Vision for Intelligent Vehicles and Applications (VIVA) dataset for dynamic hand gesture recognition captured with RGB and Depth data. On this VIVA dataset, our proposed hand gesture recognition technique outperforms competing state-of-the-art algorithms and gets an accuracy of 86%.\",\"PeriodicalId\":161392,\"journal\":{\"name\":\"The International Arab Journal of Information Technology\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The International Arab Journal of Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34028/iajit/17/4/8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Arab Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/17/4/8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

动态手势的自动分类具有很大的挑战性，因为不同类别的手势差异很大，分辨率低，并且是由手指完成的。由于面临许多挑战，许多研究人员关注这一领域。近年来，深度神经网络用于隐式特征提取，Soft Max层用于分类。在本文中，我们提出了一种基于二维卷积神经网络的方法，该方法从多模态红、绿、蓝、深(RGBD)和光流数据中同时对手势进行检测和分类，并将该特征传递给长短期记忆(LSTM)循环网络进行帧间概率生成，并将连接时间分类(CTC)网络进行损失计算。我们计算了红、绿、蓝(RGB)数据的光流，以获得视频中存在的适当运动信息。使用CTC模型通过动态规划有效地评估所有可能的手势对齐，并通过帧对帧检查未分割输入流中手势视觉相似性的一致性。CTC网络为一类手势找到一个帧的最可能序列。通过最大解码从CTC网络中选择概率值最高的帧。整个CTC网络通过计算CTC损失来进行端到端的训练，以识别手势。我们使用具有挑战性的智能车辆和应用视觉(VIVA)数据集，通过RGB和深度数据捕获动态手势识别。在这个VIVA数据集上，我们提出的手势识别技术优于竞争最先进的算法，准确率达到86%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Connectionist Temporal Classification Model for Dynamic Hand Gesture Recognition using RGB and Optical flow Data

Automatic classification of dynamic hand gesture is challenging due to the large diversity in a different class of gesture, Low resolution, and it is performed by finger. Due to a number of challenges many researchers focus on this area. Recently deep neural network can be used for implicit feature extraction and Soft Max layer is used for classification. In this paper, we propose a method based on a two-dimensional convolutional neural network that performs detection and classification of hand gesture simultaneously from multimodal Red, Green, Blue, Depth (RGBD) and Optical flow Data and passes this feature to Long-Short Term Memory (LSTM) recurrent network for frame-to-frame probability generation with Connectionist Temporal Classification (CTC) network for loss calculation. We have calculated an optical flow from Red, Green, Blue (RGB) data for getting proper motion information present in the video. CTC model is used to efficiently evaluate all possible alignment of hand gesture via dynamic programming and check consistency via frame-to-frame for the visual similarity of hand gesture in the unsegmented input stream. CTC network finds the most probable sequence of a frame for a class of gesture. The frame with the highest probability value is selected from the CTC network by max decoding. This entire CTC network is trained end-to-end with calculating CTC loss for recognition of the gesture. We have used challenging Vision for Intelligent Vehicles and Applications (VIVA) dataset for dynamic hand gesture recognition captured with RGB and Depth data. On this VIVA dataset, our proposed hand gesture recognition technique outperforms competing state-of-the-art algorithms and gets an accuracy of 86%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The International Arab Journal of Information Technology

自引率

0.00%

发文量