Emergency sign language recognition from variant of convolutional neural network (CNN) and long short term memory (LSTM) models

M. A. As’ari, N. A. J. Sufri, Guat Si Qi
{"title":"Emergency sign language recognition from variant of convolutional neural network (CNN) and long short term memory (LSTM) models","authors":"M. A. As’ari, N. A. J. Sufri, Guat Si Qi","doi":"10.26555/ijain.v10i1.1170","DOIUrl":null,"url":null,"abstract":"Sign language is the primary communication tool used by the deaf community and people with speaking difficulties, especially during emergencies. Numerous deep learning models have been proposed to solve the sign language recognition problem. Recently. Bidirectional LSTM (BLSTM) has been proposed and used in replacement of Long Short-Term Memory (LSTM) as it may improve learning long-team dependencies as well as increase the accuracy of the model. However, there needs to be more comparison for the performance of LSTM and BLSTM in LRCN model architecture in sign language interpretation applications. Therefore, this study focused on the dense analysis of the LRCN model, including 1) training the CNN from scratch and 2) modeling with pre-trained CNN, VGG-19, and ResNet50. Other than that, the ConvLSTM model, a special variant of LSTM designed for video input, has also been modeled and compared with the LRCN in representing emergency sign language recognition. Within LRCN variants, the performance of a small CNN network was compared with pre-trained VGG-19 and ResNet50V2. A dataset of emergency Indian Sign Language with eight classes is used to train the models. The model with the best performance is the VGG-19 + LSTM model, with a testing accuracy of 96.39%. Small LRCN networks, which are 5 CNN subunits + LSTM and 4 CNN subunits + BLSTM, have 95.18% testing accuracy. This performance is on par with our best-proposed model, VGG + LSTM. By incorporating bidirectional LSTM (BLSTM) into deep learning models, the ability to understand long-term dependencies can be improved. This can enhance accuracy in reading sign language, leading to more effective communication during emergencies.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"3 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advances in Intelligent Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26555/ijain.v10i1.1170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sign language is the primary communication tool used by the deaf community and people with speaking difficulties, especially during emergencies. Numerous deep learning models have been proposed to solve the sign language recognition problem. Recently. Bidirectional LSTM (BLSTM) has been proposed and used in replacement of Long Short-Term Memory (LSTM) as it may improve learning long-team dependencies as well as increase the accuracy of the model. However, there needs to be more comparison for the performance of LSTM and BLSTM in LRCN model architecture in sign language interpretation applications. Therefore, this study focused on the dense analysis of the LRCN model, including 1) training the CNN from scratch and 2) modeling with pre-trained CNN, VGG-19, and ResNet50. Other than that, the ConvLSTM model, a special variant of LSTM designed for video input, has also been modeled and compared with the LRCN in representing emergency sign language recognition. Within LRCN variants, the performance of a small CNN network was compared with pre-trained VGG-19 and ResNet50V2. A dataset of emergency Indian Sign Language with eight classes is used to train the models. The model with the best performance is the VGG-19 + LSTM model, with a testing accuracy of 96.39%. Small LRCN networks, which are 5 CNN subunits + LSTM and 4 CNN subunits + BLSTM, have 95.18% testing accuracy. This performance is on par with our best-proposed model, VGG + LSTM. By incorporating bidirectional LSTM (BLSTM) into deep learning models, the ability to understand long-term dependencies can be improved. This can enhance accuracy in reading sign language, leading to more effective communication during emergencies.
利用卷积神经网络(CNN)和长短期记忆(LSTM)模型的变体识别紧急手语
手语是聋人群体和有语言障碍的人使用的主要交流工具,尤其是在紧急情况下。为解决手语识别问题,人们提出了许多深度学习模型。最近。双向 LSTM(Bidirectional LSTM)被提出并用于替代长短时记忆(LSTM),因为它可以改善学习长队依赖性并提高模型的准确性。然而,在手语翻译应用中,需要对 LRCN 模型架构中的 LSTM 和 BLSTM 的性能进行更多比较。因此,本研究重点对 LRCN 模型进行了深入分析,包括:1)从头开始训练 CNN;2)使用预先训练的 CNN、VGG-19 和 ResNet50 建模。此外,还对 ConvLSTM 模型进行了建模,并与 LRCN 在表示紧急手语识别方面进行了比较。在 LRCN 变体中,小型 CNN 网络的性能与预先训练的 VGG-19 和 ResNet50V2 进行了比较。有八个类别的紧急印度手语数据集用于训练模型。性能最好的模型是 VGG-19 + LSTM 模型,测试准确率为 96.39%。由 5 个 CNN 子单元 + LSTM 和 4 个 CNN 子单元 + BLSTM 组成的小型 LRCN 网络的测试准确率为 95.18%。这一性能与我们提出的最佳模型 VGG + LSTM 相当。通过在深度学习模型中加入双向 LSTM(BLSTM),可以提高理解长期依赖关系的能力。这可以提高手语阅读的准确性,从而在紧急情况下进行更有效的交流。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Advances in Intelligent Informatics
International Journal of Advances in Intelligent Informatics Computer Science-Computer Vision and Pattern Recognition
CiteScore
3.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信