利用卷积神经网络（CNN）和长短期记忆（LSTM）模型的变体识别紧急手语

International Journal of Advances in Intelligent Informatics Pub Date : 2024-02-29 DOI:10.26555/ijain.v10i1.1170

M. A. As’ari, N. A. J. Sufri, Guat Si Qi

{"title":"利用卷积神经网络（CNN）和长短期记忆（LSTM）模型的变体识别紧急手语","authors":"M. A. As’ari, N. A. J. Sufri, Guat Si Qi","doi":"10.26555/ijain.v10i1.1170","DOIUrl":null,"url":null,"abstract":"Sign language is the primary communication tool used by the deaf community and people with speaking difficulties, especially during emergencies. Numerous deep learning models have been proposed to solve the sign language recognition problem. Recently. Bidirectional LSTM (BLSTM) has been proposed and used in replacement of Long Short-Term Memory (LSTM) as it may improve learning long-team dependencies as well as increase the accuracy of the model. However, there needs to be more comparison for the performance of LSTM and BLSTM in LRCN model architecture in sign language interpretation applications. Therefore, this study focused on the dense analysis of the LRCN model, including 1) training the CNN from scratch and 2) modeling with pre-trained CNN, VGG-19, and ResNet50. Other than that, the ConvLSTM model, a special variant of LSTM designed for video input, has also been modeled and compared with the LRCN in representing emergency sign language recognition. Within LRCN variants, the performance of a small CNN network was compared with pre-trained VGG-19 and ResNet50V2. A dataset of emergency Indian Sign Language with eight classes is used to train the models. The model with the best performance is the VGG-19 + LSTM model, with a testing accuracy of 96.39%. Small LRCN networks, which are 5 CNN subunits + LSTM and 4 CNN subunits + BLSTM, have 95.18% testing accuracy. This performance is on par with our best-proposed model, VGG + LSTM. By incorporating bidirectional LSTM (BLSTM) into deep learning models, the ability to understand long-term dependencies can be improved. This can enhance accuracy in reading sign language, leading to more effective communication during emergencies.","PeriodicalId":52195,"journal":{"name":"International Journal of Advances in Intelligent Informatics","volume":"3 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Emergency sign language recognition from variant of convolutional neural network (CNN) and long short term memory (LSTM) models\",\"authors\":\"M. A. As’ari, N. A. J. Sufri, Guat Si Qi\",\"doi\":\"10.26555/ijain.v10i1.1170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sign language is the primary communication tool used by the deaf community and people with speaking difficulties, especially during emergencies. Numerous deep learning models have been proposed to solve the sign language recognition problem. Recently. Bidirectional LSTM (BLSTM) has been proposed and used in replacement of Long Short-Term Memory (LSTM) as it may improve learning long-team dependencies as well as increase the accuracy of the model. However, there needs to be more comparison for the performance of LSTM and BLSTM in LRCN model architecture in sign language interpretation applications. Therefore, this study focused on the dense analysis of the LRCN model, including 1) training the CNN from scratch and 2) modeling with pre-trained CNN, VGG-19, and ResNet50. Other than that, the ConvLSTM model, a special variant of LSTM designed for video input, has also been modeled and compared with the LRCN in representing emergency sign language recognition. Within LRCN variants, the performance of a small CNN network was compared with pre-trained VGG-19 and ResNet50V2. A dataset of emergency Indian Sign Language with eight classes is used to train the models. The model with the best performance is the VGG-19 + LSTM model, with a testing accuracy of 96.39%. Small LRCN networks, which are 5 CNN subunits + LSTM and 4 CNN subunits + BLSTM, have 95.18% testing accuracy. This performance is on par with our best-proposed model, VGG + LSTM. By incorporating bidirectional LSTM (BLSTM) into deep learning models, the ability to understand long-term dependencies can be improved. This can enhance accuracy in reading sign language, leading to more effective communication during emergencies.\",\"PeriodicalId\":52195,\"journal\":{\"name\":\"International Journal of Advances in Intelligent Informatics\",\"volume\":\"3 9\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advances in Intelligent Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26555/ijain.v10i1.1170\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advances in Intelligent Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26555/ijain.v10i1.1170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

手语是聋人群体和有语言障碍的人使用的主要交流工具，尤其是在紧急情况下。为解决手语识别问题，人们提出了许多深度学习模型。最近。双向 LSTM（Bidirectional LSTM）被提出并用于替代长短时记忆（LSTM），因为它可以改善学习长队依赖性并提高模型的准确性。然而，在手语翻译应用中，需要对 LRCN 模型架构中的 LSTM 和 BLSTM 的性能进行更多比较。因此，本研究重点对 LRCN 模型进行了深入分析，包括：1）从头开始训练 CNN；2）使用预先训练的 CNN、VGG-19 和 ResNet50 建模。此外，还对 ConvLSTM 模型进行了建模，并与 LRCN 在表示紧急手语识别方面进行了比较。在 LRCN 变体中，小型 CNN 网络的性能与预先训练的 VGG-19 和 ResNet50V2 进行了比较。有八个类别的紧急印度手语数据集用于训练模型。性能最好的模型是 VGG-19 + LSTM 模型，测试准确率为 96.39%。由 5 个 CNN 子单元 + LSTM 和 4 个 CNN 子单元 + BLSTM 组成的小型 LRCN 网络的测试准确率为 95.18%。这一性能与我们提出的最佳模型 VGG + LSTM 相当。通过在深度学习模型中加入双向 LSTM（BLSTM），可以提高理解长期依赖关系的能力。这可以提高手语阅读的准确性，从而在紧急情况下进行更有效的交流。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Emergency sign language recognition from variant of convolutional neural network (CNN) and long short term memory (LSTM) models

Sign language is the primary communication tool used by the deaf community and people with speaking difficulties, especially during emergencies. Numerous deep learning models have been proposed to solve the sign language recognition problem. Recently. Bidirectional LSTM (BLSTM) has been proposed and used in replacement of Long Short-Term Memory (LSTM) as it may improve learning long-team dependencies as well as increase the accuracy of the model. However, there needs to be more comparison for the performance of LSTM and BLSTM in LRCN model architecture in sign language interpretation applications. Therefore, this study focused on the dense analysis of the LRCN model, including 1) training the CNN from scratch and 2) modeling with pre-trained CNN, VGG-19, and ResNet50. Other than that, the ConvLSTM model, a special variant of LSTM designed for video input, has also been modeled and compared with the LRCN in representing emergency sign language recognition. Within LRCN variants, the performance of a small CNN network was compared with pre-trained VGG-19 and ResNet50V2. A dataset of emergency Indian Sign Language with eight classes is used to train the models. The model with the best performance is the VGG-19 + LSTM model, with a testing accuracy of 96.39%. Small LRCN networks, which are 5 CNN subunits + LSTM and 4 CNN subunits + BLSTM, have 95.18% testing accuracy. This performance is on par with our best-proposed model, VGG + LSTM. By incorporating bidirectional LSTM (BLSTM) into deep learning models, the ability to understand long-term dependencies can be improved. This can enhance accuracy in reading sign language, leading to more effective communication during emergencies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Advances in Intelligent Informatics Computer Science-Computer Vision and Pattern Recognition

CiteScore

3.00

自引率

0.00%

发文量