{"title":"Towards realizing gesture-to-speech conversion with a HMM-based bilingual speech synthesis system","authors":"Hongwu Yang, Xiaochun An, Dong Pei, Yitong Liu","doi":"10.1109/ICOT.2014.6956608","DOIUrl":null,"url":null,"abstract":"This paper realizes a gesture-to-speech conversion system to solve the communication problem between healthy people and speech disorders. An improved speeded up robust features (SURF) algorithm is adopted for static gesture recognition by combining Kinect sensor. Meanwhile, a Hidden Markov Model (HMM) based Mandarin-Tibetan bilingual speech synthesis system is developed by using speaker adaptive training. A set of semantic rules is designed for the static gestures. Chinese or Tibetan context-dependent labels of recognized static gestures are generated according to the semantic rules. The recognized gestures are finally converted to the Mandarin or Tibetan by using the Mandarin-Tibetan bilingual speech synthesis system with the context-dependent labels. Tests show that the static gesture recognition rate of the designed system achieves 97.1%. Subjective evaluation demonstrates that synthesized speech can get 4.0 of the mean opinion score (MOS) on synthesized speech.","PeriodicalId":343641,"journal":{"name":"2014 International Conference on Orange Technologies","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Orange Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOT.2014.6956608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
This paper realizes a gesture-to-speech conversion system to solve the communication problem between healthy people and speech disorders. An improved speeded up robust features (SURF) algorithm is adopted for static gesture recognition by combining Kinect sensor. Meanwhile, a Hidden Markov Model (HMM) based Mandarin-Tibetan bilingual speech synthesis system is developed by using speaker adaptive training. A set of semantic rules is designed for the static gestures. Chinese or Tibetan context-dependent labels of recognized static gestures are generated according to the semantic rules. The recognized gestures are finally converted to the Mandarin or Tibetan by using the Mandarin-Tibetan bilingual speech synthesis system with the context-dependent labels. Tests show that the static gesture recognition rate of the designed system achieves 97.1%. Subjective evaluation demonstrates that synthesized speech can get 4.0 of the mean opinion score (MOS) on synthesized speech.