Research on Deep Learning with Gesture Recognition and LSTM in Sign Language

2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII ) Pub Date : 2022-07-22 DOI:10.1109/ICKII55100.2022.9983520

Yi-Jiuan Chung, Chih-Hsiung Shen

{"title":"Research on Deep Learning with Gesture Recognition and LSTM in Sign Language","authors":"Yi-Jiuan Chung, Chih-Hsiung Shen","doi":"10.1109/ICKII55100.2022.9983520","DOIUrl":null,"url":null,"abstract":"Sign language is a tool for the hearing impaired to communicate with each other. It is a channel for expressing thoughts and emotions, and also one of the ways they communicate with ordinary people. However, not everyone can read sign language. For those who do not understand sign language, it is difficult to receive its meaning quickly. At the same time, it also causes inconvenience for the hearing impaired. Thus, gesture recognition combined with deep learning techniques of Long Short-Term Memory (LSTM) is used to translate sign language into sentences with the correct meaning in this study. Then, the convenience of communication between hearing-impaired and ordinary people can be enhanced. People can more easily understand the sign language expressions of hearing-impaired and improve their willingness to communicate and interact. The study of Sign Language Recognition (SLR) is to translate the gesture and continuity of the sign language for expressing the semantics, which provides a convenient tool for communication. In this study, we constructed a complete recognition model by combining a Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) neural network to complete continuous recognition work. An ordered image sequence is extracted from the video and converted into a vector through the image database for training and learning sign language using the powerful image recognition capabilities of CNN. Next, the LSTM model is used to connect with the fully connected layer of CNN to complete the accomplished semantic recognition. In particular, the concept of Recurrent Neural Network (RNN) is suitable for time series data processing and the construction of sequence data learning. After making modifications to the traditional RNN architecture, the LSTM performs better in terms of memory and appropriate data length. We built gesture and sign language datasets and adopted the CNN-LSTM recognition method. As a result, a higher recognition rate was achieved with a smaller training set, which meets the needs of real-time SLR systems.","PeriodicalId":352222,"journal":{"name":"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKII55100.2022.9983520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Sign language is a tool for the hearing impaired to communicate with each other. It is a channel for expressing thoughts and emotions, and also one of the ways they communicate with ordinary people. However, not everyone can read sign language. For those who do not understand sign language, it is difficult to receive its meaning quickly. At the same time, it also causes inconvenience for the hearing impaired. Thus, gesture recognition combined with deep learning techniques of Long Short-Term Memory (LSTM) is used to translate sign language into sentences with the correct meaning in this study. Then, the convenience of communication between hearing-impaired and ordinary people can be enhanced. People can more easily understand the sign language expressions of hearing-impaired and improve their willingness to communicate and interact. The study of Sign Language Recognition (SLR) is to translate the gesture and continuity of the sign language for expressing the semantics, which provides a convenient tool for communication. In this study, we constructed a complete recognition model by combining a Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) neural network to complete continuous recognition work. An ordered image sequence is extracted from the video and converted into a vector through the image database for training and learning sign language using the powerful image recognition capabilities of CNN. Next, the LSTM model is used to connect with the fully connected layer of CNN to complete the accomplished semantic recognition. In particular, the concept of Recurrent Neural Network (RNN) is suitable for time series data processing and the construction of sequence data learning. After making modifications to the traditional RNN architecture, the LSTM performs better in terms of memory and appropriate data length. We built gesture and sign language datasets and adopted the CNN-LSTM recognition method. As a result, a higher recognition rate was achieved with a smaller training set, which meets the needs of real-time SLR systems.

查看原文本刊更多论文

基于手势识别和LSTM的手语深度学习研究

手语是听障人士相互交流的工具。这是他们表达思想和情感的渠道，也是他们与普通人交流的方式之一。然而，并不是每个人都能读懂手语。对于那些不懂手语的人来说，很难迅速理解它的意思。同时，也给听障人士带来不便。因此，本研究将手势识别与长短期记忆(LSTM)深度学习技术相结合，将手语翻译成具有正确含义的句子。这样就可以提高听障人士与普通人交流的便利性。人们可以更容易地理解听障人士的手语表达，提高他们沟通和互动的意愿。手语识别的研究就是通过翻译手语的手势和连续性来表达语义，为交流提供方便的工具。在本研究中，我们将卷积神经网络(CNN)与长短期记忆(LSTM)神经网络相结合，构建了一个完整的识别模型来完成连续识别工作。从视频中提取有序的图像序列，通过图像数据库转换成矢量，利用CNN强大的图像识别能力进行手语训练和学习。接下来，使用LSTM模型与CNN的全连接层进行连接，完成完成的语义识别。特别是递归神经网络(RNN)的概念适用于时间序列数据的处理和序列数据学习的构建。通过对传统RNN结构的改进，LSTM在内存和合适的数据长度方面表现更好。我们建立了手势和手语数据集，并采用CNN-LSTM识别方法。以较小的训练集实现了较高的识别率，满足了实时单反系统的需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII )

自引率

0.00%

发文量