Wu Qin, Xue Mei, Yuming Chen, Qihang Zhang, Yanyin Yao, S. Hu
{"title":"Sign Language Recognition and Translation Method based on VTN","authors":"Wu Qin, Xue Mei, Yuming Chen, Qihang Zhang, Yanyin Yao, S. Hu","doi":"10.1109/dsins54396.2021.9670588","DOIUrl":null,"url":null,"abstract":"Sign language recognition plays an important role in real-time sign language translation, communication for deaf people, education and human-computer interaction. However, vision-based sign language recognition faces difficulties such as insufficient data, huge network models and poor timeliness. We use VTN (Video Transformer Net) to construct a lightweight sign language translation network. We construct the dataset called CSL_BS (Chinese Sign Language-Bank and Station) and two-way VTN to train isolated sign language and compares it with I3D (Inflated three Dimension). Then I3D and VTN are respectively used as feature extraction modules to extract the features of continuous sign language sequences, which are used as the input of the continuous sign language translation decoding network (seq2seq). Based on CSL-BS, two-way VTN achieves 87.9% accuracy while two-way I3D is 84.2%. And the recognition speed is increased by 46.8%. In respect of continuous sign language translation, the accuracy of VTN_seq2seq is 73.5% while I3D_seq2seq is 71.2%, the recognition speed is 13.91s and 26.54s respectively.","PeriodicalId":243724,"journal":{"name":"2021 International Conference on Digital Society and Intelligent Systems (DSInS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Digital Society and Intelligent Systems (DSInS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/dsins54396.2021.9670588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Sign language recognition plays an important role in real-time sign language translation, communication for deaf people, education and human-computer interaction. However, vision-based sign language recognition faces difficulties such as insufficient data, huge network models and poor timeliness. We use VTN (Video Transformer Net) to construct a lightweight sign language translation network. We construct the dataset called CSL_BS (Chinese Sign Language-Bank and Station) and two-way VTN to train isolated sign language and compares it with I3D (Inflated three Dimension). Then I3D and VTN are respectively used as feature extraction modules to extract the features of continuous sign language sequences, which are used as the input of the continuous sign language translation decoding network (seq2seq). Based on CSL-BS, two-way VTN achieves 87.9% accuracy while two-way I3D is 84.2%. And the recognition speed is increased by 46.8%. In respect of continuous sign language translation, the accuracy of VTN_seq2seq is 73.5% while I3D_seq2seq is 71.2%, the recognition speed is 13.91s and 26.54s respectively.
手语识别在手语实时翻译、聋人交流、教育和人机交互等方面发挥着重要作用。然而,基于视觉的手语识别面临着数据不足、网络模型庞大、时效性差等困难。我们使用VTN (Video Transformer Net)构建了一个轻量级的手语翻译网络。我们构建CSL_BS (Chinese Sign language bank and Station)数据集和双向VTN来训练孤立的手语,并将其与I3D(充气三维)进行比较。然后分别以I3D和VTN作为特征提取模块,提取连续手语序列的特征,作为连续手语翻译解码网络(seq2seq)的输入。基于CSL-BS的双向VTN准确率为87.9%,双向I3D准确率为84.2%。识别速度提高了46.8%。在连续手语翻译中,VTN_seq2seq的准确率为73.5%,I3D_seq2seq的准确率为71.2%,识别速度分别为13.91s和26.54s。