{"title":"Comparison of LDM and HMM for an Application of a Speech","authors":"V. Mane, A. B. Patil, K. P. Paradeshi","doi":"10.1109/ARTCOM.2010.65","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition (ASR) has moved from science-fiction fantasy to daily reality for citizens of technological societies. Some people seek it out, preferring dictating to typing, or benefiting from voice control of aids such as wheel-chairs. Others find it embedded in their hi-tec gadgetry – in mobile phones and car navigation systems, or cropping up in what would have until recently been human roles such as telephone booking of cinema tickets. Wherever you may meet it, computer speech recognition is here, and it’s here to stay. Most of the automatic speech recognition (ASR) systems are based on hidden Markov Model in which Guassian Mixturess model is used. The output of this model depends on subphone states. Dynamic information is typically included by appending time-derivatives to feature vectors. This approach was quite successful. This approach makes the false assumption of framewise independence of the augmented feature vectors and ignores the spatial correlations in the parametrised speech signal. This is the short coming while applying HMM for acoustic modeling for ASR. Rather than modelling individual frames of data, LDMs characterize entire segments of speech. An auto-regressive state evolution through a continuous space gives a Markovian model. The underlying dynamics, and spatial correlations between feature dimensions. LDMs are well suited to modelling smoothly varying, continuous, yet noisy trajectories such as found in measured articulatory data.","PeriodicalId":398854,"journal":{"name":"2010 International Conference on Advances in Recent Technologies in Communication and Computing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Advances in Recent Technologies in Communication and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARTCOM.2010.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Automatic speech recognition (ASR) has moved from science-fiction fantasy to daily reality for citizens of technological societies. Some people seek it out, preferring dictating to typing, or benefiting from voice control of aids such as wheel-chairs. Others find it embedded in their hi-tec gadgetry – in mobile phones and car navigation systems, or cropping up in what would have until recently been human roles such as telephone booking of cinema tickets. Wherever you may meet it, computer speech recognition is here, and it’s here to stay. Most of the automatic speech recognition (ASR) systems are based on hidden Markov Model in which Guassian Mixturess model is used. The output of this model depends on subphone states. Dynamic information is typically included by appending time-derivatives to feature vectors. This approach was quite successful. This approach makes the false assumption of framewise independence of the augmented feature vectors and ignores the spatial correlations in the parametrised speech signal. This is the short coming while applying HMM for acoustic modeling for ASR. Rather than modelling individual frames of data, LDMs characterize entire segments of speech. An auto-regressive state evolution through a continuous space gives a Markovian model. The underlying dynamics, and spatial correlations between feature dimensions. LDMs are well suited to modelling smoothly varying, continuous, yet noisy trajectories such as found in measured articulatory data.