基于动态时间规整和神经网络的人机交互框架

2017 International Conference on Inventive Computing and Informatics (ICICI) Pub Date : 2017-11-01 DOI:10.1109/ICICI.2017.8365346

D. Vishwakarma, Shamshad Ansari

{"title":"基于动态时间规整和神经网络的人机交互框架","authors":"D. Vishwakarma, Shamshad Ansari","doi":"10.1109/ICICI.2017.8365346","DOIUrl":null,"url":null,"abstract":"Due to incapability in speaking and hearing, there always exist a group of people in the community who face difficulties in communication. These people use some symbols and gestures to convey their messages and receive their messages, this form of communication is known as Sign Language. We have provided a solution based on dynamic time warping (DTW) for the first module and software based solution for the second module by exploiting the latest technology of Microsoft Kinect depth camera which tracks the 20 joint location of human beings. In sign to speech/text conversion block, the actor performs some valid gestures within the Kinect field of view. The gestures are taken up by the Kinect sensor and then interpreted by comparing it with already stored trained gestures in the dictionary. After the sign is recognized, it is copied to the respective word which is transferred to the speech conversion and text conversion module to produce the output. In the second block, which is a speech to sign/gesture conversion, the person speaks in Kinect field of view which is taken by the Kinect, and the system converts speech into text, and corresponding word is mapped into predefined gesture which is played on the screen. This way a disabled person can visualize the spoken word. The accuracy of sign to speech module is found to be 87%, and that of speech to gesture module is 91.203%.","PeriodicalId":369524,"journal":{"name":"2017 International Conference on Inventive Computing and Informatics (ICICI)","volume":"0 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A framework for human-computer interaction using dynamic time warping and neural network\",\"authors\":\"D. Vishwakarma, Shamshad Ansari\",\"doi\":\"10.1109/ICICI.2017.8365346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to incapability in speaking and hearing, there always exist a group of people in the community who face difficulties in communication. These people use some symbols and gestures to convey their messages and receive their messages, this form of communication is known as Sign Language. We have provided a solution based on dynamic time warping (DTW) for the first module and software based solution for the second module by exploiting the latest technology of Microsoft Kinect depth camera which tracks the 20 joint location of human beings. In sign to speech/text conversion block, the actor performs some valid gestures within the Kinect field of view. The gestures are taken up by the Kinect sensor and then interpreted by comparing it with already stored trained gestures in the dictionary. After the sign is recognized, it is copied to the respective word which is transferred to the speech conversion and text conversion module to produce the output. In the second block, which is a speech to sign/gesture conversion, the person speaks in Kinect field of view which is taken by the Kinect, and the system converts speech into text, and corresponding word is mapped into predefined gesture which is played on the screen. This way a disabled person can visualize the spoken word. The accuracy of sign to speech module is found to be 87%, and that of speech to gesture module is 91.203%.\",\"PeriodicalId\":369524,\"journal\":{\"name\":\"2017 International Conference on Inventive Computing and Informatics (ICICI)\",\"volume\":\"0 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Inventive Computing and Informatics (ICICI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICI.2017.8365346\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Inventive Computing and Informatics (ICICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICI.2017.8365346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于语言和听力的障碍，社区中总是存在着一群沟通困难的人。这些人使用一些符号和手势来传达和接收他们的信息，这种交流形式被称为手语。我们为第一个模块提供了基于动态时间规整(DTW)的解决方案，为第二个模块提供了基于软件的解决方案，利用微软Kinect深度相机的最新技术跟踪人类的20个关节位置。在手势到语音/文本的转换块中，演员在Kinect视野内执行一些有效的手势。这些手势被Kinect传感器接收，然后通过将其与字典中已经存储的训练过的手势进行比较来解释。识别出符号后，将其复制到相应的单词中，并将其传输到语音转换和文本转换模块以产生输出。第二个块是语音到手势/手势的转换，人在Kinect接收到的Kinect视域内说话，系统将语音转换为文本，对应的单词被映射成预定义的手势，在屏幕上播放。这样一来，残疾人就可以想象自己说的话了。手语转语音模块的准确率为87%，语音转手势模块的准确率为91.203%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A framework for human-computer interaction using dynamic time warping and neural network

Due to incapability in speaking and hearing, there always exist a group of people in the community who face difficulties in communication. These people use some symbols and gestures to convey their messages and receive their messages, this form of communication is known as Sign Language. We have provided a solution based on dynamic time warping (DTW) for the first module and software based solution for the second module by exploiting the latest technology of Microsoft Kinect depth camera which tracks the 20 joint location of human beings. In sign to speech/text conversion block, the actor performs some valid gestures within the Kinect field of view. The gestures are taken up by the Kinect sensor and then interpreted by comparing it with already stored trained gestures in the dictionary. After the sign is recognized, it is copied to the respective word which is transferred to the speech conversion and text conversion module to produce the output. In the second block, which is a speech to sign/gesture conversion, the person speaks in Kinect field of view which is taken by the Kinect, and the system converts speech into text, and corresponding word is mapped into predefined gesture which is played on the screen. This way a disabled person can visualize the spoken word. The accuracy of sign to speech module is found to be 87%, and that of speech to gesture module is 91.203%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on Inventive Computing and Informatics (ICICI)

自引率

0.00%

发文量