利用滤波器组能量作为特征进行鲁棒语音识别

ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359) Pub Date : 1999-08-22 DOI:10.1109/ISSPA.1999.815754

K. Paliwal

{"title":"利用滤波器组能量作为特征进行鲁棒语音识别","authors":"K. Paliwal","doi":"10.1109/ISSPA.1999.815754","DOIUrl":null,"url":null,"abstract":"Though mel frequency cepstral coefficients (MFCCs) have been very successful in speech recognition, they have the following two problems: (1) they do not have any physical interpretation, and (2) liftering of cepstral coefficients, found to be highly useful in the earlier dynamic warping-based speech recognition systems, has no effect in the recognition process when used with continuous observation Gaussian density hidden Markov models. We propose to use the filter-bank energies (FBEs) as features. The FBEs are physically meaningful quantities and amenable for applying human auditory processing such as masking. We describe procedures to decorrelate and lifter the FBEs and show that the FBEs perform at least as good as (and sometimes even better than) the MFCCs for robust speech recognition.","PeriodicalId":302569,"journal":{"name":"ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"On the use of filter-bank energies as features for robust speech recognition\",\"authors\":\"K. Paliwal\",\"doi\":\"10.1109/ISSPA.1999.815754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Though mel frequency cepstral coefficients (MFCCs) have been very successful in speech recognition, they have the following two problems: (1) they do not have any physical interpretation, and (2) liftering of cepstral coefficients, found to be highly useful in the earlier dynamic warping-based speech recognition systems, has no effect in the recognition process when used with continuous observation Gaussian density hidden Markov models. We propose to use the filter-bank energies (FBEs) as features. The FBEs are physically meaningful quantities and amenable for applying human auditory processing such as masking. We describe procedures to decorrelate and lifter the FBEs and show that the FBEs perform at least as good as (and sometimes even better than) the MFCCs for robust speech recognition.\",\"PeriodicalId\":302569,\"journal\":{\"name\":\"ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPA.1999.815754\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPA.1999.815754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

尽管低频倒谱系数(mfccc)在语音识别中取得了很大的成功，但它们存在以下两个问题:(1)它们不具有任何物理解释;(2)在早期基于动态扭曲的语音识别系统中发现非常有用的倒谱系数的提升在连续观测高斯密度隐马尔可夫模型中使用时对识别过程没有影响。我们建议使用滤波器组能量(FBEs)作为特征。fbe是物理上有意义的数量，并且适用于人类听觉处理，例如掩蔽。我们描述了去关联和提升fbe的过程，并表明fbe在鲁棒语音识别方面的表现至少与mfcc一样好(有时甚至更好)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the use of filter-bank energies as features for robust speech recognition

Though mel frequency cepstral coefficients (MFCCs) have been very successful in speech recognition, they have the following two problems: (1) they do not have any physical interpretation, and (2) liftering of cepstral coefficients, found to be highly useful in the earlier dynamic warping-based speech recognition systems, has no effect in the recognition process when used with continuous observation Gaussian density hidden Markov models. We propose to use the filter-bank energies (FBEs) as features. The FBEs are physically meaningful quantities and amenable for applying human auditory processing such as masking. We describe procedures to decorrelate and lifter the FBEs and show that the FBEs perform at least as good as (and sometimes even better than) the MFCCs for robust speech recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359)

自引率

0.00%

发文量