Yu-Chih Deng, Cheng-Hsin Lin, Y. Liao, Yih-Ru Wang, Sin-Horng Chen
{"title":"Prosodic Information-Assisted DNN-based Mandarin Spontaneous-Speech Recognition","authors":"Yu-Chih Deng, Cheng-Hsin Lin, Y. Liao, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/O-COCOSDA50338.2020.9295010","DOIUrl":null,"url":null,"abstract":"This paper continues the method proposed in [1] and updates its traditional HMM-based ASR to state-of-the-art DNN-based ASR. Use prosodic information to assist state-of-the-art DNN-based Mandarin spontaneous-speech recognition, especially to alleviate the serious interference of annoying disfluencies and paralinguistic phenomena during decoding. This approach adopts a sophisticated hierarchical prosodic model (HPM) made of several break-syntax, break-acoustic, syllable prosodic and prosodic state models to rescore and improve the TDNN-f+RNNLM-based 1st pass decoding output and generate, at the same time, the word, Part of Speech (POS), Punctuation Mark (PM), tone, break type, and prosodic state tags for further use. Experimental results showed the HPM-based system not only dramatically reduced the word error rate from previous best value: 41.8% [1] to 21.2%. It also detected well the underlying POS, PMs, and tones (10.9%, 12.6%, and 2.3% error rates were achieved, respectively). This confirms that the proposed method is very promising on tackling the task of Mandarin spontaneous-speech recognition.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper continues the method proposed in [1] and updates its traditional HMM-based ASR to state-of-the-art DNN-based ASR. Use prosodic information to assist state-of-the-art DNN-based Mandarin spontaneous-speech recognition, especially to alleviate the serious interference of annoying disfluencies and paralinguistic phenomena during decoding. This approach adopts a sophisticated hierarchical prosodic model (HPM) made of several break-syntax, break-acoustic, syllable prosodic and prosodic state models to rescore and improve the TDNN-f+RNNLM-based 1st pass decoding output and generate, at the same time, the word, Part of Speech (POS), Punctuation Mark (PM), tone, break type, and prosodic state tags for further use. Experimental results showed the HPM-based system not only dramatically reduced the word error rate from previous best value: 41.8% [1] to 21.2%. It also detected well the underlying POS, PMs, and tones (10.9%, 12.6%, and 2.3% error rates were achieved, respectively). This confirms that the proposed method is very promising on tackling the task of Mandarin spontaneous-speech recognition.