Mask estimation employing Posterior-based Representative Mean for missing-feature speech recognition with time-varying background noise

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI:10.1109/ASRU.2009.5373398

Wooil Kim, J. Hansen

{"title":"Mask estimation employing Posterior-based Representative Mean for missing-feature speech recognition with time-varying background noise","authors":"Wooil Kim, J. Hansen","doi":"10.1109/ASRU.2009.5373398","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in time-varying background noise conditions. Conventional mask estimation methods based on noise estimates and spectral subtraction fail to reliably estimate the mask. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) vector for determining the reliability of the input speech spectrum, which is obtained as a weighted sum of the mean parameters of the speech model with posterior probabilities. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method [1]. Experimental results demonstrate that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +36.29% and +30.45% average relative improvements in WER for speech babble and background music conditions respectively, compared to conventional mask estimation methods.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in time-varying background noise conditions. Conventional mask estimation methods based on noise estimates and spectral subtraction fail to reliably estimate the mask. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) vector for determining the reliability of the input speech spectrum, which is obtained as a weighted sum of the mean parameters of the speech model with posterior probabilities. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method [1]. Experimental results demonstrate that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +36.29% and +30.45% average relative improvements in WER for speech babble and background music conditions respectively, compared to conventional mask estimation methods.

查看原文本刊更多论文

基于后验代表性均值的失特征语音识别掩码估计

为了提高时变背景噪声条件下的语音识别性能，提出了一种新的缺失特征重构掩码估计方法。传统的基于噪声估计和谱减法的掩模估计方法不能可靠地估计掩模。所提出的掩码估计方法利用基于后验的代表性均值(PRM)向量来确定输入语音频谱的可靠性，该向量是语音模型中具有后验概率的平均参数的加权和。为了获得被噪声破坏的语音模型，我们采用了一种模型组合的方法，该方法是我们在之前的研究中提出的一种特征补偿方法[1]。实验结果表明，在时变背景噪声条件下，所提出的掩码估计方法能显著提高语音识别性能。通过采用本文提出的基于prm的掩码估计进行缺失特征重建，与传统的掩码估计方法相比，在呀呀学语和背景音乐条件下，我们的平均相对噪差分别提高了+36.29%和+30.45%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量