Improvements on Mel-Frequency Cepstrum Minimum-Mean-Square-Error Noise Suppressor for Robust Speech Recognition

2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI:10.1109/CHINSL.2008.ECP.29

Dong Yu, L. Deng, Jian Wu, Y. Gong, A. Acero

{"title":"Improvements on Mel-Frequency Cepstrum Minimum-Mean-Square-Error Noise Suppressor for Robust Speech Recognition","authors":"Dong Yu, L. Deng, Jian Wu, Y. Gong, A. Acero","doi":"10.1109/CHINSL.2008.ECP.29","DOIUrl":null,"url":null,"abstract":"Recently we have developed a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Our novel algorithm operates on the power spectral magnitude of the filter-bank's outputs and outperforms the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah in both recognition accuracy and efficiency as demonstrated on the Aurora-3 corpora. This paper serves two purposes. First, we show that the algorithm is effective on large vocabulary tasks with tri-phone acoustic models. Second, we report improvements on the suppression rule of the original MFCC-MMSE noise suppressor by smoothing the gain over the previous frames to prevent the abrupt change of the gain over frames and adjusting gain function based on the noise power so that the suppression is aggressive when the noise level is high and conservative when the noise level is low. We also propose an efficient and effective parameter tuning algorithm named step-adaptive discriminative learning algorithm (SADLA) to adjust the parameters used by the noise tracker and the suppressor. We observed a 46% relative word error (WER) reduction on an in-house large-vocabulary noisy speech database with a clean trained model, which translates into a 16% relative WER reduction over the original MFCC-MMSE noise suppressor, and 6% relative WER reduction on the Aurora-3 corpora over our original MFCC-MMSE algorithm or 30% relative WER reduction over the CMN baseline.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"36 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 6th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2008.ECP.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Recently we have developed a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Our novel algorithm operates on the power spectral magnitude of the filter-bank's outputs and outperforms the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah in both recognition accuracy and efficiency as demonstrated on the Aurora-3 corpora. This paper serves two purposes. First, we show that the algorithm is effective on large vocabulary tasks with tri-phone acoustic models. Second, we report improvements on the suppression rule of the original MFCC-MMSE noise suppressor by smoothing the gain over the previous frames to prevent the abrupt change of the gain over frames and adjusting gain function based on the noise power so that the suppression is aggressive when the noise level is high and conservative when the noise level is low. We also propose an efficient and effective parameter tuning algorithm named step-adaptive discriminative learning algorithm (SADLA) to adjust the parameters used by the noise tracker and the suppressor. We observed a 46% relative word error (WER) reduction on an in-house large-vocabulary noisy speech database with a clean trained model, which translates into a 16% relative WER reduction over the original MFCC-MMSE noise suppressor, and 6% relative WER reduction on the Aurora-3 corpora over our original MFCC-MMSE algorithm or 30% relative WER reduction over the CMN baseline.

查看原文本刊更多论文

鲁棒语音识别中mel频率倒频谱最小均方误差噪声抑制器的改进

最近，我们开发了一种基于Mel-frequency倒频谱最小均方误差(MMSE)准则的非线性特征域降噪算法，用于环境鲁棒语音识别。我们的新算法对滤波器组输出的功率谱幅值进行操作，在识别精度和效率方面优于Ephraim和Malah提出的对数mmse谱幅噪声抑制器，并在Aurora-3语料库上进行了验证。这篇文章有两个目的。首先，我们证明了该算法在具有三部手机声学模型的大词汇量任务上是有效的。其次，对原mfc - mmse噪声抑制器的抑制规则进行了改进，通过平滑前几帧的增益来防止帧间增益的突变，并根据噪声功率调整增益函数，使噪声水平高时抑制积极，噪声水平低时抑制保守。我们还提出了一种高效的参数调整算法——阶跃自适应判别学习算法(SADLA)来调整噪声跟踪器和抑制器使用的参数。我们观察到，在一个内部的大词汇量噪声语音数据库中，使用一个干净的训练模型，相对单词错误(WER)降低了46%，这意味着与原始MFCC-MMSE噪声抑制器相比，相对单词错误(WER)降低了16%，与原始MFCC-MMSE算法相比，在ora-3语料库上相对单词错误(WER)降低了6%，与CMN基线相比，相对单词错误(WER)降低了30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 6th International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量