Achieving Real-Time Sign Language Translation Using a Smartphone's True Depth Images

2020 International Conference on COMmunication Systems & NETworkS (COMSNETS) Pub Date : 2020-01-01 DOI:10.1109/COMSNETS48256.2020.9027420

Hyeonjun Park, Jong-Seok Lee, Jeonggil Ko

{"title":"Achieving Real-Time Sign Language Translation Using a Smartphone's True Depth Images","authors":"Hyeonjun Park, Jong-Seok Lee, Jeonggil Ko","doi":"10.1109/COMSNETS48256.2020.9027420","DOIUrl":null,"url":null,"abstract":"Sign language is used as a visual form of communication among the deaf and is considered as an official language in many countries. While there has been many efforts to achieve efficient translation between sign and verbal languages, many of these previous work can be applied in non-mobile context or exploit RGB images which can potentially invade the users' privacy. This work presents our preliminary efforts in designing a mobile device-based sign language translation system using depth-only images. Our system performs image processing on the smartphone-collected depth images to emphasize the subject's hand and upper body gestures and exploits a convolutional neural network for feature extraction. The series of features gathered from word-representing videos are passed through a Long-Short Term Memory (LSTM) model for word-level sign language translation. We train and test our system using a total of 2,200 samples collected from 26 people for 17 words. The classification accuracy of our proposed system using the self-collected data achieves 92% with an efficient image preprocessing phase.","PeriodicalId":265871,"journal":{"name":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS48256.2020.9027420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Sign language is used as a visual form of communication among the deaf and is considered as an official language in many countries. While there has been many efforts to achieve efficient translation between sign and verbal languages, many of these previous work can be applied in non-mobile context or exploit RGB images which can potentially invade the users' privacy. This work presents our preliminary efforts in designing a mobile device-based sign language translation system using depth-only images. Our system performs image processing on the smartphone-collected depth images to emphasize the subject's hand and upper body gestures and exploits a convolutional neural network for feature extraction. The series of features gathered from word-representing videos are passed through a Long-Short Term Memory (LSTM) model for word-level sign language translation. We train and test our system using a total of 2,200 samples collected from 26 people for 17 words. The classification accuracy of our proposed system using the self-collected data achieves 92% with an efficient image preprocessing phase.

查看原文本刊更多论文

利用智能手机的真深度图像实现实时手语翻译

手语是聋哑人之间交流的一种视觉形式，在许多国家被视为官方语言。虽然已经有很多努力来实现手语和口头语言之间的有效翻译，但许多这些先前的工作可以应用于非移动环境或利用可能侵犯用户隐私的RGB图像。这项工作介绍了我们在设计一个基于移动设备的使用深度图像的手语翻译系统方面的初步努力。我们的系统对智能手机采集的深度图像进行图像处理，以强调受试者的手部和上半身手势，并利用卷积神经网络进行特征提取。从表示单词的视频中收集到的一系列特征通过长短期记忆(LSTM)模型进行单词级手语翻译。我们使用从26个人身上收集的2200个样本来训练和测试我们的系统，这些样本来自17个单词。该系统利用自采集数据，经过高效的图像预处理，分类准确率达到92%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)

自引率

0.00%

发文量