基于贝叶斯框架的语音识别模型结构集成

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2012-03-25 DOI:10.1109/ICASSP.2012.6288996

Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda

{"title":"基于贝叶斯框架的语音识别模型结构集成","authors":"Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda","doi":"10.1109/ICASSP.2012.6288996","DOIUrl":null,"url":null,"abstract":"This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters, and its effectiveness in HMM-based speech recognition has been reported. Although the basic idea underlying the Bayesian approach is to treat all parameters as random variables, only one model structure is still selected in the conventional method. Multiple model structures are treated as latent variables in the proposed method and integrated based on the Bayesian framework. Furthermore, we applied deterministic annealing to the training algorithm to estimate appropriate acoustic models. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"4813-4816"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A model structure integration based on a Bayesian framework for speech recognition\",\"authors\":\"Sayaka Shiota, Kei Hashimoto, Yoshihiko Nankaku, K. Tokuda\",\"doi\":\"10.1109/ICASSP.2012.6288996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters, and its effectiveness in HMM-based speech recognition has been reported. Although the basic idea underlying the Bayesian approach is to treat all parameters as random variables, only one model structure is still selected in the conventional method. Multiple model structures are treated as latent variables in the proposed method and integrated based on the Bayesian framework. Furthermore, we applied deterministic annealing to the training algorithm to estimate appropriate acoustic models. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.\",\"PeriodicalId\":6443,\"journal\":{\"name\":\"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"21 1\",\"pages\":\"4813-4816\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2012.6288996\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2012.6288996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

提出了一种基于贝叶斯框架的多模型结构声学建模技术，用于语音识别。贝叶斯方法是一种通过边缘化模型参数来估计可靠预测分布的统计技术，它在基于hmm的语音识别中的有效性已经得到了报道。尽管贝叶斯方法的基本思想是将所有参数视为随机变量，但传统方法仍然只选择一种模型结构。该方法将多个模型结构作为潜在变量，并基于贝叶斯框架进行集成。此外，我们将确定性退火应用到训练算法中，以估计合适的声学模型。该方法有效地利用了多种模型结构，特别是在训练的早期阶段，从而获得了更好的预测分布，提高了识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A model structure integration based on a Bayesian framework for speech recognition

This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters, and its effectiveness in HMM-based speech recognition has been reported. Although the basic idea underlying the Bayesian approach is to treat all parameters as random variables, only one model structure is still selected in the conventional method. Multiple model structures are treated as latent variables in the proposed method and integrated based on the Bayesian framework. Furthermore, we applied deterministic annealing to the training algorithm to estimate appropriate acoustic models. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量