{"title":"基于动态时间规整和神经网络的人机交互框架","authors":"D. Vishwakarma, Shamshad Ansari","doi":"10.1109/ICICI.2017.8365346","DOIUrl":null,"url":null,"abstract":"Due to incapability in speaking and hearing, there always exist a group of people in the community who face difficulties in communication. These people use some symbols and gestures to convey their messages and receive their messages, this form of communication is known as Sign Language. We have provided a solution based on dynamic time warping (DTW) for the first module and software based solution for the second module by exploiting the latest technology of Microsoft Kinect depth camera which tracks the 20 joint location of human beings. In sign to speech/text conversion block, the actor performs some valid gestures within the Kinect field of view. The gestures are taken up by the Kinect sensor and then interpreted by comparing it with already stored trained gestures in the dictionary. After the sign is recognized, it is copied to the respective word which is transferred to the speech conversion and text conversion module to produce the output. In the second block, which is a speech to sign/gesture conversion, the person speaks in Kinect field of view which is taken by the Kinect, and the system converts speech into text, and corresponding word is mapped into predefined gesture which is played on the screen. This way a disabled person can visualize the spoken word. The accuracy of sign to speech module is found to be 87%, and that of speech to gesture module is 91.203%.","PeriodicalId":369524,"journal":{"name":"2017 International Conference on Inventive Computing and Informatics (ICICI)","volume":"0 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A framework for human-computer interaction using dynamic time warping and neural network\",\"authors\":\"D. Vishwakarma, Shamshad Ansari\",\"doi\":\"10.1109/ICICI.2017.8365346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to incapability in speaking and hearing, there always exist a group of people in the community who face difficulties in communication. These people use some symbols and gestures to convey their messages and receive their messages, this form of communication is known as Sign Language. We have provided a solution based on dynamic time warping (DTW) for the first module and software based solution for the second module by exploiting the latest technology of Microsoft Kinect depth camera which tracks the 20 joint location of human beings. In sign to speech/text conversion block, the actor performs some valid gestures within the Kinect field of view. The gestures are taken up by the Kinect sensor and then interpreted by comparing it with already stored trained gestures in the dictionary. After the sign is recognized, it is copied to the respective word which is transferred to the speech conversion and text conversion module to produce the output. In the second block, which is a speech to sign/gesture conversion, the person speaks in Kinect field of view which is taken by the Kinect, and the system converts speech into text, and corresponding word is mapped into predefined gesture which is played on the screen. This way a disabled person can visualize the spoken word. The accuracy of sign to speech module is found to be 87%, and that of speech to gesture module is 91.203%.\",\"PeriodicalId\":369524,\"journal\":{\"name\":\"2017 International Conference on Inventive Computing and Informatics (ICICI)\",\"volume\":\"0 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Inventive Computing and Informatics (ICICI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICI.2017.8365346\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Inventive Computing and Informatics (ICICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICI.2017.8365346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A framework for human-computer interaction using dynamic time warping and neural network
Due to incapability in speaking and hearing, there always exist a group of people in the community who face difficulties in communication. These people use some symbols and gestures to convey their messages and receive their messages, this form of communication is known as Sign Language. We have provided a solution based on dynamic time warping (DTW) for the first module and software based solution for the second module by exploiting the latest technology of Microsoft Kinect depth camera which tracks the 20 joint location of human beings. In sign to speech/text conversion block, the actor performs some valid gestures within the Kinect field of view. The gestures are taken up by the Kinect sensor and then interpreted by comparing it with already stored trained gestures in the dictionary. After the sign is recognized, it is copied to the respective word which is transferred to the speech conversion and text conversion module to produce the output. In the second block, which is a speech to sign/gesture conversion, the person speaks in Kinect field of view which is taken by the Kinect, and the system converts speech into text, and corresponding word is mapped into predefined gesture which is played on the screen. This way a disabled person can visualize the spoken word. The accuracy of sign to speech module is found to be 87%, and that of speech to gesture module is 91.203%.