Comparison of LDM and HMM for an Application of a Speech

2010 International Conference on Advances in Recent Technologies in Communication and Computing Pub Date : 2010-10-16 DOI:10.1109/ARTCOM.2010.65

V. Mane, A. B. Patil, K. P. Paradeshi

{"title":"Comparison of LDM and HMM for an Application of a Speech","authors":"V. Mane, A. B. Patil, K. P. Paradeshi","doi":"10.1109/ARTCOM.2010.65","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition (ASR) has moved from science-fiction fantasy to daily reality for citizens of technological societies. Some people seek it out, preferring dictating to typing, or benefiting from voice control of aids such as wheel-chairs. Others find it embedded in their hi-tec gadgetry – in mobile phones and car navigation systems, or cropping up in what would have until recently been human roles such as telephone booking of cinema tickets. Wherever you may meet it, computer speech recognition is here, and it’s here to stay. Most of the automatic speech recognition (ASR) systems are based on hidden Markov Model in which Guassian Mixturess model is used. The output of this model depends on subphone states. Dynamic information is typically included by appending time-derivatives to feature vectors. This approach was quite successful. This approach makes the false assumption of framewise independence of the augmented feature vectors and ignores the spatial correlations in the parametrised speech signal. This is the short coming while applying HMM for acoustic modeling for ASR. Rather than modelling individual frames of data, LDMs characterize entire segments of speech. An auto-regressive state evolution through a continuous space gives a Markovian model. The underlying dynamics, and spatial correlations between feature dimensions. LDMs are well suited to modelling smoothly varying, continuous, yet noisy trajectories such as found in measured articulatory data.","PeriodicalId":398854,"journal":{"name":"2010 International Conference on Advances in Recent Technologies in Communication and Computing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Advances in Recent Technologies in Communication and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARTCOM.2010.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Automatic speech recognition (ASR) has moved from science-fiction fantasy to daily reality for citizens of technological societies. Some people seek it out, preferring dictating to typing, or benefiting from voice control of aids such as wheel-chairs. Others find it embedded in their hi-tec gadgetry – in mobile phones and car navigation systems, or cropping up in what would have until recently been human roles such as telephone booking of cinema tickets. Wherever you may meet it, computer speech recognition is here, and it’s here to stay. Most of the automatic speech recognition (ASR) systems are based on hidden Markov Model in which Guassian Mixturess model is used. The output of this model depends on subphone states. Dynamic information is typically included by appending time-derivatives to feature vectors. This approach was quite successful. This approach makes the false assumption of framewise independence of the augmented feature vectors and ignores the spatial correlations in the parametrised speech signal. This is the short coming while applying HMM for acoustic modeling for ASR. Rather than modelling individual frames of data, LDMs characterize entire segments of speech. An auto-regressive state evolution through a continuous space gives a Markovian model. The underlying dynamics, and spatial correlations between feature dimensions. LDMs are well suited to modelling smoothly varying, continuous, yet noisy trajectories such as found in measured articulatory data.

查看原文本刊更多论文

LDM和HMM在语音应用中的比较

自动语音识别(ASR)已经从科幻小说的幻想变成了科技社会公民的日常现实。有些人喜欢听写，而不是打字，或者从轮椅等辅助设备的语音控制中受益。另一些人则发现它嵌入在他们的高科技设备中——在手机和汽车导航系统中，或者突然出现在直到最近才出现的人类角色中，比如电话预订电影票。无论你在哪里遇到它，计算机语音识别就在这里，而且会一直存在下去。大多数自动语音识别系统都是基于隐马尔可夫模型，其中使用了高斯混合模型。该模型的输出取决于子电话状态。动态信息通常通过向特征向量附加时间导数来包含。这种方法相当成功。该方法对增强特征向量的帧间独立性进行了错误假设，忽略了参数化语音信号中的空间相关性。这是将HMM应用于ASR声学建模的不足之处。ldm不是对单个数据帧建模，而是对整个语音片段进行表征。通过连续空间的自回归状态演化给出了一个马尔可夫模型。潜在的动态，以及特征维度之间的空间相关性。ldm非常适合于建模平滑变化的、连续的、但有噪声的轨迹，例如在测量的发音数据中发现的轨迹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 International Conference on Advances in Recent Technologies in Communication and Computing

自引率

0.00%

发文量