{"title":"Deep neural networks for kannada phoneme recognition","authors":"R. Pradeep, K. S. Rao","doi":"10.1109/IC3.2016.7880202","DOIUrl":null,"url":null,"abstract":"Deep neural network (DNN) based speech recognizers have recently replaced Gaussian Mixture Model (GMM) based systems as the state-of-the-art. Developing a phonetic engine and enhancing its performance can lead to significant improvement in Automatic Speech Recognition (ASR). However only a less work has been reported in developing Phonetic engine on large vocabulary Kannada speech corpus. In this paper, the comparative study of speech recognition baselines: HMM-GMM, HMM-ANN and HMM-DNN are analyzed. Our first set of experiments use the Kannada speech corpus, which contains continuous utterances recorded in three different modes namely read mode, lecture mode and conversation mode. Context independent phone modeling is carried out on the three baselines and evaluated on different modes of the corpus. Phone Error Rate is measured and compared on all the three baselines. Acoustic modeling using HMM-DNN baseline shows significant improvement of about 7–8 % over HMM-GMM and HMM-ANN baselines.","PeriodicalId":294210,"journal":{"name":"2016 Ninth International Conference on Contemporary Computing (IC3)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Ninth International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2016.7880202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Deep neural network (DNN) based speech recognizers have recently replaced Gaussian Mixture Model (GMM) based systems as the state-of-the-art. Developing a phonetic engine and enhancing its performance can lead to significant improvement in Automatic Speech Recognition (ASR). However only a less work has been reported in developing Phonetic engine on large vocabulary Kannada speech corpus. In this paper, the comparative study of speech recognition baselines: HMM-GMM, HMM-ANN and HMM-DNN are analyzed. Our first set of experiments use the Kannada speech corpus, which contains continuous utterances recorded in three different modes namely read mode, lecture mode and conversation mode. Context independent phone modeling is carried out on the three baselines and evaluated on different modes of the corpus. Phone Error Rate is measured and compared on all the three baselines. Acoustic modeling using HMM-DNN baseline shows significant improvement of about 7–8 % over HMM-GMM and HMM-ANN baselines.