流隐马尔可夫建模声学特征的因子分析

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI:10.1109/ASRU.2007.4430079

Chuan-Wei Ting, Jen-Tzung Chien

{"title":"流隐马尔可夫建模声学特征的因子分析","authors":"Chuan-Wei Ting, Jen-Tzung Chien","doi":"10.1109/ASRU.2007.4430079","DOIUrl":null,"url":null,"abstract":"This paper presents a new streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) is performed to discover the common factors of acoustic features. The streaming regularities are governed by the correlation between features, which is inherent in common factors. Those features corresponding to the same factor are generated by identical HMM state. Accordingly, we use multiple Markov chains to represent the variation trends in cepstral features. We develop a FA streamed HMM (FASHMM) and go beyond the conventional HMM assuming that all features at a speech frame conduct the same state emission. This streamed HMM is more delicate than the factorial HMM where the streaming was empirically determined. We also exploit a new decoding algorithm for FASHMM speech recognition. In this manner, we fulfill the flexible Markov chains for an input sequence of multivariate Gaussian mixture observations. In the experiments, the proposed method can reduce word error rate by 36% at most.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Factor analysis of acoustic features for streamed hidden Markov modeling\",\"authors\":\"Chuan-Wei Ting, Jen-Tzung Chien\",\"doi\":\"10.1109/ASRU.2007.4430079\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a new streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) is performed to discover the common factors of acoustic features. The streaming regularities are governed by the correlation between features, which is inherent in common factors. Those features corresponding to the same factor are generated by identical HMM state. Accordingly, we use multiple Markov chains to represent the variation trends in cepstral features. We develop a FA streamed HMM (FASHMM) and go beyond the conventional HMM assuming that all features at a speech frame conduct the same state emission. This streamed HMM is more delicate than the factorial HMM where the streaming was empirically determined. We also exploit a new decoding algorithm for FASHMM speech recognition. In this manner, we fulfill the flexible Markov chains for an input sequence of multivariate Gaussian mixture observations. In the experiments, the proposed method can reduce word error rate by 36% at most.\",\"PeriodicalId\":371729,\"journal\":{\"name\":\"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2007.4430079\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2007.4430079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

提出了一种新的流隐马尔可夫模型(HMM)框架。因子分析(FA)是发现声学特征的共同因素。流的规律是由特征之间的相关性决定的，这种相关性是共同因素所固有的。这些特征对应于相同的因子是由相同的HMM状态生成的。因此，我们使用多个马尔可夫链来表示倒谱特征的变化趋势。我们开发了一种FA流HMM (FASHMM)，并超越了传统的HMM，假设语音帧的所有特征都进行相同的状态发射。这种流化HMM比经验决定流化的阶乘HMM更精细。我们还开发了一种新的FASHMM语音识别解码算法。用这种方法，我们实现了多元高斯混合观测值输入序列的柔性马尔可夫链。在实验中，该方法最多可将单词错误率降低36%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Factor analysis of acoustic features for streamed hidden Markov modeling

This paper presents a new streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) is performed to discover the common factors of acoustic features. The streaming regularities are governed by the correlation between features, which is inherent in common factors. Those features corresponding to the same factor are generated by identical HMM state. Accordingly, we use multiple Markov chains to represent the variation trends in cepstral features. We develop a FA streamed HMM (FASHMM) and go beyond the conventional HMM assuming that all features at a speech frame conduct the same state emission. This streamed HMM is more delicate than the factorial HMM where the streaming was empirically determined. We also exploit a new decoding algorithm for FASHMM speech recognition. In this manner, we fulfill the flexible Markov chains for an input sequence of multivariate Gaussian mixture observations. In the experiments, the proposed method can reduce word error rate by 36% at most.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

自引率

0.00%

发文量