{"title":"Split Acoustic Modeling in Decoder for Phoneme Recognition","authors":"R. Pradeep, K. S. Rao","doi":"10.1109/INDICON.2017.8487556","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Much of the recent research has been concentrated in reducing the computational complexity involved in DNN training by developing different architectures. However the search space of the decoder in automatic speech recognition (ASR) is huge and also it is prune to have substitution errors. In this work, we introduce split decoding mechanism by creating sonorant and obstruent acoustic models. The speech frames that are detected as sonorants and obstruents are fed only to the sonorant acoustic models and obstruent acoustic models respectively. It reduces the decoder search space in ASR and also minimises the substitution errors. The manner of sonorants that includes broadly the vowels, the semi-vowels and the nasals are detected by exploiting the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum. The proposed split decoding method based on sonority detection decreased the phone error rates by nearly 0.7% when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the state-of-the-art DNN.","PeriodicalId":263943,"journal":{"name":"2017 14th IEEE India Council International Conference (INDICON)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th IEEE India Council International Conference (INDICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDICON.2017.8487556","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Much of the recent research has been concentrated in reducing the computational complexity involved in DNN training by developing different architectures. However the search space of the decoder in automatic speech recognition (ASR) is huge and also it is prune to have substitution errors. In this work, we introduce split decoding mechanism by creating sonorant and obstruent acoustic models. The speech frames that are detected as sonorants and obstruents are fed only to the sonorant acoustic models and obstruent acoustic models respectively. It reduces the decoder search space in ASR and also minimises the substitution errors. The manner of sonorants that includes broadly the vowels, the semi-vowels and the nasals are detected by exploiting the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum. The proposed split decoding method based on sonority detection decreased the phone error rates by nearly 0.7% when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the state-of-the-art DNN.