从左到右的HDPM排放HDP-HMM

2014 48th Annual Conference on Information Sciences and Systems (CISS) Pub Date : 2014-03-19 DOI:10.1109/CISS.2014.6814172

A. Torbati, J. Picone, M. Sobel

{"title":"从左到右的HDPM排放HDP-HMM","authors":"A. Torbati, J. Picone, M. Sobel","doi":"10.1109/CISS.2014.6814172","DOIUrl":null,"url":null,"abstract":"Nonparametric Bayesian models use a Bayesian framework to learn the model complexity automatically from the data and eliminate the need for a complex model selection process. The Hierarchical Dirichlet Process hidden Markov model (HDP-HMM) is the nonparametric Bayesian equivalent of an HMM. However, HDP-HMM is restricted to an ergodic topology and uses a Dirichlet Process Model (DPM) to achieve a mixture distribution-like model. For applications such as speech recognition, where we deal with ordered sequences, it is desirable to impose a left-to-right structure on the model to improve its ability to model the sequential nature of the speech signal. In this paper, we introduce three enhancements to HDP-HMM: (1) a left-to-right structure: needed for sequential decoding of speech, (2) non-emitting initial and final states: required for modeling finite length sequences, (3) HDP mixture emissions: allows sharing of data across states. The latter is particularly important for speech recognition because Gaussian mixture models have been very effective at modeling speaker variability. Further, due to the nature of language, some models occur infrequently and have a small number of data points associated with them, even for large corpora. Sharing allows these models to be estimated more accurately. We demonstrate that this new HDP-HMM model produces a 15% increase in likelihoods and a 15% relative reduction in error rate on a phoneme classification task based on the TIMIT Corpus.","PeriodicalId":169460,"journal":{"name":"2014 48th Annual Conference on Information Sciences and Systems (CISS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A left-to-right HDP-HMM with HDPM emissions\",\"authors\":\"A. Torbati, J. Picone, M. Sobel\",\"doi\":\"10.1109/CISS.2014.6814172\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nonparametric Bayesian models use a Bayesian framework to learn the model complexity automatically from the data and eliminate the need for a complex model selection process. The Hierarchical Dirichlet Process hidden Markov model (HDP-HMM) is the nonparametric Bayesian equivalent of an HMM. However, HDP-HMM is restricted to an ergodic topology and uses a Dirichlet Process Model (DPM) to achieve a mixture distribution-like model. For applications such as speech recognition, where we deal with ordered sequences, it is desirable to impose a left-to-right structure on the model to improve its ability to model the sequential nature of the speech signal. In this paper, we introduce three enhancements to HDP-HMM: (1) a left-to-right structure: needed for sequential decoding of speech, (2) non-emitting initial and final states: required for modeling finite length sequences, (3) HDP mixture emissions: allows sharing of data across states. The latter is particularly important for speech recognition because Gaussian mixture models have been very effective at modeling speaker variability. Further, due to the nature of language, some models occur infrequently and have a small number of data points associated with them, even for large corpora. Sharing allows these models to be estimated more accurately. We demonstrate that this new HDP-HMM model produces a 15% increase in likelihoods and a 15% relative reduction in error rate on a phoneme classification task based on the TIMIT Corpus.\",\"PeriodicalId\":169460,\"journal\":{\"name\":\"2014 48th Annual Conference on Information Sciences and Systems (CISS)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 48th Annual Conference on Information Sciences and Systems (CISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISS.2014.6814172\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 48th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS.2014.6814172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

非参数贝叶斯模型使用贝叶斯框架从数据中自动学习模型复杂度，消除了复杂的模型选择过程。层次狄利克雷过程隐马尔可夫模型(HDP-HMM)是隐马尔可夫模型的非参数贝叶斯等价。然而，HDP-HMM仅限于遍历拓扑，并使用Dirichlet过程模型(DPM)来实现类似混合分布的模型。对于语音识别等处理有序序列的应用，我们希望在模型上施加从左到右的结构，以提高其对语音信号的顺序性质建模的能力。在本文中，我们介绍了HDP- hmm的三个增强功能:(1)从左到右的结构:语音顺序解码所需;(2)不发射初始和最终状态:建模有限长度序列所需;(3)HDP混合发射:允许跨状态共享数据。后者对于语音识别尤其重要，因为高斯混合模型在建模说话人变化方面非常有效。此外，由于语言的性质，一些模型很少出现，并且与它们相关联的数据点很少，即使对于大型语料库也是如此。共享可以使这些模型得到更准确的估计。我们证明了这种新的HDP-HMM模型在基于TIMIT语料库的音素分类任务上产生了15%的似然增加和15%的错误率相对降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A left-to-right HDP-HMM with HDPM emissions

Nonparametric Bayesian models use a Bayesian framework to learn the model complexity automatically from the data and eliminate the need for a complex model selection process. The Hierarchical Dirichlet Process hidden Markov model (HDP-HMM) is the nonparametric Bayesian equivalent of an HMM. However, HDP-HMM is restricted to an ergodic topology and uses a Dirichlet Process Model (DPM) to achieve a mixture distribution-like model. For applications such as speech recognition, where we deal with ordered sequences, it is desirable to impose a left-to-right structure on the model to improve its ability to model the sequential nature of the speech signal. In this paper, we introduce three enhancements to HDP-HMM: (1) a left-to-right structure: needed for sequential decoding of speech, (2) non-emitting initial and final states: required for modeling finite length sequences, (3) HDP mixture emissions: allows sharing of data across states. The latter is particularly important for speech recognition because Gaussian mixture models have been very effective at modeling speaker variability. Further, due to the nature of language, some models occur infrequently and have a small number of data points associated with them, even for large corpora. Sharing allows these models to be estimated more accurately. We demonstrate that this new HDP-HMM model produces a 15% increase in likelihoods and a 15% relative reduction in error rate on a phoneme classification task based on the TIMIT Corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 48th Annual Conference on Information Sciences and Systems (CISS)

自引率

0.00%

发文量