使用深度学习的手语识别

M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina
{"title":"使用深度学习的手语识别","authors":"M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina","doi":"10.1109/DeSE58274.2023.10100055","DOIUrl":null,"url":null,"abstract":"Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.","PeriodicalId":346847,"journal":{"name":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sign Language Recognition using Deep Learning\",\"authors\":\"M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina\",\"doi\":\"10.1109/DeSE58274.2023.10100055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.\",\"PeriodicalId\":346847,\"journal\":{\"name\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DeSE58274.2023.10100055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DeSE58274.2023.10100055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

手语识别是动作识别问题的一种形式。这种系统的目的是自动将手语从一种语言翻译成另一种语言。虽然在单反领域已经做了很多工作,但它是一个广泛的研究领域,还有许多领域需要关注。我们在本文中提出的工作旨在研究深度学习方法在识别和分类不同手语视频帧中的单词方面的适用性。我们考虑三种手语,即印度手语,美国手语和土耳其手语。我们的方法采用了五种不同的深度学习模型,其复杂性不断增加。它们是浅四层卷积神经网络,基本VGG16模型,带注意机制的VGG16模型,带变压器编码器和基于门控循环单元的解码器的VGG16模型,以及具有相同功能的充气3D模型。我们训练并测试了这些模型,以识别和分类三种不同的手语数据集中的视频中的单词。从实验中,我们发现模型的性能与模型的复杂性密切相关,其中充气3D模型表现最好。此外,我们还发现所有模型都发现在美国手语数据集中识别单词比其他模型更难。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sign Language Recognition using Deep Learning
Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信