Real-time hand posture and gesture-based touchless automotive user interface using deep learning

2017 IEEE Intelligent Vehicles Symposium (IV) Pub Date : 2017-06-01 DOI:10.1109/IVS.2017.7995825

V. John, Makoto Umetsu, Ali Boyali, S. Mita, Masayuki Imanishi, Norio Sanma, Syunsuke Shibata

{"title":"Real-time hand posture and gesture-based touchless automotive user interface using deep learning","authors":"V. John, Makoto Umetsu, Ali Boyali, S. Mita, Masayuki Imanishi, Norio Sanma, Syunsuke Shibata","doi":"10.1109/IVS.2017.7995825","DOIUrl":null,"url":null,"abstract":"In this study, a vision based in-car entertainment user interface is presented. The user interface is designed using a hand posture and gesture recognition algorithm in deep learning framework. The hand posture recognition algorithm is formulated using the convolutional neural network to perform the fundamental tasks in the user interface. The hand gesture recognition algorithm is formulated using the long-term recurrent convolutional neural network to intuitively interact with the touchless automotive user interface in a detailed manner. In the recurrent deep learning framework, typically, the gesture frames are taken from a uniformly sampled image sequence. In this work, the recurrent structure is enhanced using a reduced number of input frames captured from the image sequence. The reduced input frames or key frames represent the action present in the video sequence. Sparse dictionary learning provide reliable key frame extraction from video sequences. However, sparse dictionary learning is computationally expensive, and are individually optimized for every video sequence. In this paper, we propose to approximate sparse dictionary learning using a non-linear regression framework. The multilayer perceptron is utilized to model the non-linear regression framework. The optimal neural network architecture is identified after a detailed evaluation. We evaluate the proposed recognition methods on public datasets. The proposed methods yield a recognition accuracy of 92% and 90% for pose and gestures, respectively. The combined hand posture and gesture recognition takes 82ms which is a reasonable for real time implementation.","PeriodicalId":143367,"journal":{"name":"2017 IEEE Intelligent Vehicles Symposium (IV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVS.2017.7995825","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

In this study, a vision based in-car entertainment user interface is presented. The user interface is designed using a hand posture and gesture recognition algorithm in deep learning framework. The hand posture recognition algorithm is formulated using the convolutional neural network to perform the fundamental tasks in the user interface. The hand gesture recognition algorithm is formulated using the long-term recurrent convolutional neural network to intuitively interact with the touchless automotive user interface in a detailed manner. In the recurrent deep learning framework, typically, the gesture frames are taken from a uniformly sampled image sequence. In this work, the recurrent structure is enhanced using a reduced number of input frames captured from the image sequence. The reduced input frames or key frames represent the action present in the video sequence. Sparse dictionary learning provide reliable key frame extraction from video sequences. However, sparse dictionary learning is computationally expensive, and are individually optimized for every video sequence. In this paper, we propose to approximate sparse dictionary learning using a non-linear regression framework. The multilayer perceptron is utilized to model the non-linear regression framework. The optimal neural network architecture is identified after a detailed evaluation. We evaluate the proposed recognition methods on public datasets. The proposed methods yield a recognition accuracy of 92% and 90% for pose and gestures, respectively. The combined hand posture and gesture recognition takes 82ms which is a reasonable for real time implementation.

查看原文本刊更多论文

使用深度学习的实时手部姿势和基于手势的非接触式汽车用户界面

本研究提出一种基于视觉的车载娱乐用户界面。使用深度学习框架中的手势识别算法设计了用户界面。该手势识别算法是利用卷积神经网络来完成用户界面中的基本任务。采用长期循环卷积神经网络制定手势识别算法，直观地与非接触式汽车用户界面进行详细交互。在循环深度学习框架中，手势帧通常取自均匀采样的图像序列。在这项工作中，使用从图像序列中捕获的减少数量的输入帧来增强循环结构。减少的输入帧或关键帧表示视频序列中出现的动作。稀疏字典学习从视频序列中提供可靠的关键帧提取。然而，稀疏字典学习在计算上是昂贵的，并且是针对每个视频序列单独优化的。在本文中，我们提出使用非线性回归框架来近似稀疏字典学习。利用多层感知器对非线性回归框架进行建模。经过详细的评估，确定了最优的神经网络结构。我们在公共数据集上评估了所提出的识别方法。所提出的方法对姿态和手势的识别准确率分别达到92%和90%。手势和手势联合识别的时间为82ms，这对于实时实现来说是合理的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Intelligent Vehicles Symposium (IV)

自引率

0.00%

发文量