{"title":"Fast algorithm for isolated words recognition based on Hidden Markov model stationary distribution","authors":"Pavel Paramonov","doi":"10.1109/ISCMI.2017.8279612","DOIUrl":null,"url":null,"abstract":"Over the last few decades Hidden Markov models (HMM) became core technology in automatic speech recognition (ASR). Contemporary HMM approach is based on usage of Gaussian mixture models (GMM) as acoustic models that are capable of statistical inference of speech variability. Deep neural networks (DNN) applied to ASR as acoustic models outperformed GMM in large vocabulary speech recognition. However, conventional approaches to ASR are very computationally expensive, what makes it impossible to apply them in voice control systems on low power devices. This paper focuses on the approach to isolated words recognition with reduced computational costs, what makes it feasible for in-place recognition on low computational resources devices. All components of the isolated words recognizer are described. Quantized Mel-frequency cepstral coefficients are used as speech features. The fast algorithm of isolated words recognition is described. It is based on a stationary distribution of Hidden Markov model and has linear computational complexity. Another important feature of the proposed approach is that it requires significantly less memory to store model parameters comparing to HMM-GMM and DNN models. Algorithm performance is evaluated on TIMIT isolated words dataset. The proposed method performance is compared with the results, that showed conventional forward algorithm, HMM-GMM approach and Self-Adjustable Neural Network. Only HMM-GMM outperformed proposed stationary distribution approach.","PeriodicalId":119111,"journal":{"name":"2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 4th International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI.2017.8279612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Over the last few decades Hidden Markov models (HMM) became core technology in automatic speech recognition (ASR). Contemporary HMM approach is based on usage of Gaussian mixture models (GMM) as acoustic models that are capable of statistical inference of speech variability. Deep neural networks (DNN) applied to ASR as acoustic models outperformed GMM in large vocabulary speech recognition. However, conventional approaches to ASR are very computationally expensive, what makes it impossible to apply them in voice control systems on low power devices. This paper focuses on the approach to isolated words recognition with reduced computational costs, what makes it feasible for in-place recognition on low computational resources devices. All components of the isolated words recognizer are described. Quantized Mel-frequency cepstral coefficients are used as speech features. The fast algorithm of isolated words recognition is described. It is based on a stationary distribution of Hidden Markov model and has linear computational complexity. Another important feature of the proposed approach is that it requires significantly less memory to store model parameters comparing to HMM-GMM and DNN models. Algorithm performance is evaluated on TIMIT isolated words dataset. The proposed method performance is compared with the results, that showed conventional forward algorithm, HMM-GMM approach and Self-Adjustable Neural Network. Only HMM-GMM outperformed proposed stationary distribution approach.