基于标记检测、GRU和LSTM的手语识别

American Journal of Electronics & Communication Pub Date : 2023-01-02 DOI:10.15864/ajec.3305

Subhalaxmi Chakraborty, Prayosi Paul, Suparna Bhattacharjee, Soumadeep Sarkar, Arindam Chakraborty

{"title":"基于标记检测、GRU和LSTM的手语识别","authors":"Subhalaxmi Chakraborty, Prayosi Paul, Suparna Bhattacharjee, Soumadeep Sarkar, Arindam Chakraborty","doi":"10.15864/ajec.3305","DOIUrl":null,"url":null,"abstract":"Speech impairment is a kind of disability, affects individual's ability to communicate with each other. People with this problem use sign language for their communication. Though communication through sign language has been taken care of, there exists communication gap between signed\n and non-signed people. To overcome this type of complexity researchers are trying to develop systems using deep learning approach. The main objective of this paper is subject to implement a vision-based application that offers translation of sign language to voice message and text to reduce\n the gap between two kinds of people mentioned above. The proposed model extracts temporal and spatial features after taking video sequences. To extract the spatial features, MediaPipe Holistic has been used that consists of several solutions for the detecting face, had and pose landmarks.\n Different kind of RNN (Recurrent Neural Network) like LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) have been used is to train on temporal features. By using both models and American Signed Language, 99% accuracy has been achieved. The experimental result shows that the recognition\n method with MediaPipe Holistic followed by GRU or LSTM can achieve a high recognition rate that meets the need of a Sign Language Recognition system that on the real-time basis. Based on the expectation, this analysis will facilitate creation of intelligent- based Sign Language Recognition\n systems and knowledge accumulation and provide direction to guide to the correct path.","PeriodicalId":280977,"journal":{"name":"American Journal of Electronics & Communication","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Sign Language Recognition Using Landmark Detection, GRU and LSTM\",\"authors\":\"Subhalaxmi Chakraborty, Prayosi Paul, Suparna Bhattacharjee, Soumadeep Sarkar, Arindam Chakraborty\",\"doi\":\"10.15864/ajec.3305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech impairment is a kind of disability, affects individual's ability to communicate with each other. People with this problem use sign language for their communication. Though communication through sign language has been taken care of, there exists communication gap between signed\\n and non-signed people. To overcome this type of complexity researchers are trying to develop systems using deep learning approach. The main objective of this paper is subject to implement a vision-based application that offers translation of sign language to voice message and text to reduce\\n the gap between two kinds of people mentioned above. The proposed model extracts temporal and spatial features after taking video sequences. To extract the spatial features, MediaPipe Holistic has been used that consists of several solutions for the detecting face, had and pose landmarks.\\n Different kind of RNN (Recurrent Neural Network) like LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) have been used is to train on temporal features. By using both models and American Signed Language, 99% accuracy has been achieved. The experimental result shows that the recognition\\n method with MediaPipe Holistic followed by GRU or LSTM can achieve a high recognition rate that meets the need of a Sign Language Recognition system that on the real-time basis. Based on the expectation, this analysis will facilitate creation of intelligent- based Sign Language Recognition\\n systems and knowledge accumulation and provide direction to guide to the correct path.\",\"PeriodicalId\":280977,\"journal\":{\"name\":\"American Journal of Electronics & Communication\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Electronics & Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15864/ajec.3305\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Electronics & Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15864/ajec.3305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

言语障碍是一种残疾，影响个体与他人交流的能力。有这个问题的人使用手语进行交流。虽然手语交流得到了重视，但手语人士与非手语人士之间的交流仍存在差距。为了克服这种复杂性，研究人员正在尝试使用深度学习方法开发系统。本文的主要目标是实现一个基于视觉的应用程序，提供手语到语音信息和文本的翻译，以减少上述两种人之间的差距。该模型提取视频序列后的时空特征。为了提取空间特征，采用了MediaPipe Holistic方法，该方法由几种检测人脸、姿态和姿态地标的解决方案组成。不同类型的RNN(递归神经网络)如LSTM(长短期记忆)和GRU(门控递归单元)被用来训练时间特征。通过同时使用模型和美国手语，准确率达到99%。实验结果表明，采用MediaPipe Holistic再辅以GRU或LSTM的识别方法可以获得较高的识别率，满足了实时性强的手语识别系统的需要。基于此，本文的分析将促进基于智能的手语识别系统的创建和知识积累，并为引导正确的路径提供方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sign Language Recognition Using Landmark Detection, GRU and LSTM

Speech impairment is a kind of disability, affects individual's ability to communicate with each other. People with this problem use sign language for their communication. Though communication through sign language has been taken care of, there exists communication gap between signed and non-signed people. To overcome this type of complexity researchers are trying to develop systems using deep learning approach. The main objective of this paper is subject to implement a vision-based application that offers translation of sign language to voice message and text to reduce the gap between two kinds of people mentioned above. The proposed model extracts temporal and spatial features after taking video sequences. To extract the spatial features, MediaPipe Holistic has been used that consists of several solutions for the detecting face, had and pose landmarks. Different kind of RNN (Recurrent Neural Network) like LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) have been used is to train on temporal features. By using both models and American Signed Language, 99% accuracy has been achieved. The experimental result shows that the recognition method with MediaPipe Holistic followed by GRU or LSTM can achieve a high recognition rate that meets the need of a Sign Language Recognition system that on the real-time basis. Based on the expectation, this analysis will facilitate creation of intelligent- based Sign Language Recognition systems and knowledge accumulation and provide direction to guide to the correct path.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

American Journal of Electronics & Communication

自引率

0.00%

发文量