基于模型的语音自动识别错误率估计研究

C. Huang, Hsiao-Chuan Wang, Chin-Hui Lee
{"title":"基于模型的语音自动识别错误率估计研究","authors":"C. Huang, Hsiao-Chuan Wang, Chin-Hui Lee","doi":"10.1109/TSA.2003.818030","DOIUrl":null,"url":null,"abstract":"A model-based framework of classification error rate estimation is proposed for speech and speaker recognition. It aims at predicting the run-time performance of a hidden Markov model (HMM) based recognition system for a given task vocabulary and grammar without the need of running recognition experiments using a separate set of testing samples. This is highly desirable both in theory and in practice. However, the error rate expression in HMM-based speech recognition systems has no closed form solution due to the complexity of the multi-class comparison process and the need for dynamic time warping to handle various speech patterns. To alleviate the difficulty, we propose a one-dimensional model-based misclassification measure to evaluate the distance between a particular model of interest and a combination of many of its competing models. The error rate for a class characterized by the HMM is then the value of a smoothed zero-one error function given the misclassification measure. The overall error rate of the task vocabulary could then be computed as a function of all the available class error rates. The key here is to evaluate the misclassification measure in terms of the parameters of environmental-matched models without running recognition experiments, where the models are adapted by very limited data that could be just the testing utterance itself. In this paper, we show how the misclassification measure could be approximated by first computing the distance between two mixture Gaussian densities, then between two HMMs with mixture Gaussian state observation densities and finally between two sequences of HMMs. The misclassification measure is then converted into classification error rate. When comparing the error rate obtained in running actual experiments and that of the new framework, the proposed algorithm accurately estimates the classification error rate for many types of speech and speaker recognition problems. Based on the same framework, it is also demonstrated that the error rate of a recognition system in a noisy environment could also be predicted.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"18 1","pages":"581-589"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"A study on model-based error rate estimation for automatic speech recognition\",\"authors\":\"C. Huang, Hsiao-Chuan Wang, Chin-Hui Lee\",\"doi\":\"10.1109/TSA.2003.818030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A model-based framework of classification error rate estimation is proposed for speech and speaker recognition. It aims at predicting the run-time performance of a hidden Markov model (HMM) based recognition system for a given task vocabulary and grammar without the need of running recognition experiments using a separate set of testing samples. This is highly desirable both in theory and in practice. However, the error rate expression in HMM-based speech recognition systems has no closed form solution due to the complexity of the multi-class comparison process and the need for dynamic time warping to handle various speech patterns. To alleviate the difficulty, we propose a one-dimensional model-based misclassification measure to evaluate the distance between a particular model of interest and a combination of many of its competing models. The error rate for a class characterized by the HMM is then the value of a smoothed zero-one error function given the misclassification measure. The overall error rate of the task vocabulary could then be computed as a function of all the available class error rates. The key here is to evaluate the misclassification measure in terms of the parameters of environmental-matched models without running recognition experiments, where the models are adapted by very limited data that could be just the testing utterance itself. In this paper, we show how the misclassification measure could be approximated by first computing the distance between two mixture Gaussian densities, then between two HMMs with mixture Gaussian state observation densities and finally between two sequences of HMMs. The misclassification measure is then converted into classification error rate. When comparing the error rate obtained in running actual experiments and that of the new framework, the proposed algorithm accurately estimates the classification error rate for many types of speech and speaker recognition problems. Based on the same framework, it is also demonstrated that the error rate of a recognition system in a noisy environment could also be predicted.\",\"PeriodicalId\":13155,\"journal\":{\"name\":\"IEEE Trans. Speech Audio Process.\",\"volume\":\"18 1\",\"pages\":\"581-589\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Trans. Speech Audio Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TSA.2003.818030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.818030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

提出了一种基于模型的语音和说话人识别分类错误率估计框架。它旨在预测基于隐马尔可夫模型(HMM)的识别系统对给定任务词汇和语法的运行时性能,而无需使用单独的测试样本集运行识别实验。这在理论上和实践中都是非常可取的。然而,基于hmm的语音识别系统的错误率表达式由于多类比较过程的复杂性和需要动态时间规整来处理各种语音模式而没有封闭形式的解。为了减轻这一困难,我们提出了一种基于一维模型的错误分类度量来评估特定感兴趣的模型与其许多竞争模型的组合之间的距离。然后,由HMM表征的类的错误率是给定错误分类度量的平滑的0 - 1误差函数的值。然后,任务词汇表的总体错误率可以作为所有可用类错误率的函数来计算。这里的关键是在不运行识别实验的情况下,根据环境匹配模型的参数来评估错误分类措施,其中模型被非常有限的数据所适应,这些数据可能只是测试话语本身。在本文中,我们展示了如何通过首先计算两个混合高斯密度之间的距离,然后计算两个具有混合高斯状态观测密度的hmm之间的距离,最后计算两个hmm序列之间的距离来近似误分类度量。然后将误分类度量转换为分类错误率。将实际实验的错误率与新框架的错误率进行比较,该算法能够准确地估计出多种类型语音和说话人识别问题的分类错误率。基于相同的框架,还证明了在噪声环境下识别系统的错误率也可以预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A study on model-based error rate estimation for automatic speech recognition
A model-based framework of classification error rate estimation is proposed for speech and speaker recognition. It aims at predicting the run-time performance of a hidden Markov model (HMM) based recognition system for a given task vocabulary and grammar without the need of running recognition experiments using a separate set of testing samples. This is highly desirable both in theory and in practice. However, the error rate expression in HMM-based speech recognition systems has no closed form solution due to the complexity of the multi-class comparison process and the need for dynamic time warping to handle various speech patterns. To alleviate the difficulty, we propose a one-dimensional model-based misclassification measure to evaluate the distance between a particular model of interest and a combination of many of its competing models. The error rate for a class characterized by the HMM is then the value of a smoothed zero-one error function given the misclassification measure. The overall error rate of the task vocabulary could then be computed as a function of all the available class error rates. The key here is to evaluate the misclassification measure in terms of the parameters of environmental-matched models without running recognition experiments, where the models are adapted by very limited data that could be just the testing utterance itself. In this paper, we show how the misclassification measure could be approximated by first computing the distance between two mixture Gaussian densities, then between two HMMs with mixture Gaussian state observation densities and finally between two sequences of HMMs. The misclassification measure is then converted into classification error rate. When comparing the error rate obtained in running actual experiments and that of the new framework, the proposed algorithm accurately estimates the classification error rate for many types of speech and speaker recognition problems. Based on the same framework, it is also demonstrated that the error rate of a recognition system in a noisy environment could also be predicted.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信