HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis

2008 Hands-Free Speech Communication and Microphone Arrays Pub Date : 2008-05-06 DOI:10.1093/ietisy/e91-d.9.2360

J. Park, J. Yoon, H. Kim

引用次数: 8

Abstract

In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 69.14% when compared with the Gaussian kernel-based mask estimation method.

查看原文本刊更多论文

基于hmm的基于计算听觉场景分析的语音识别前端掩码估计

本文提出了一种新的用于双麦克风语音计算听觉场景分析(CASA)的掩码估计方法。该方法基于隐马尔可夫模型(HMM)，以纳入在连续分析帧上掩码信息应该相关的观察。换句话说，HMM用于估计两个信道信号的掩码信息，表示为听间时间差(ITD)和听间电平差(ILD)，并最终利用估计的掩码信息进行期望语音和有噪声语音的分离。为了证明所提出的掩码估计的有效性，我们将所提出的方法与基于高斯核的估计方法在语音识别性能方面的性能进行了比较。结果表明，与基于高斯核的掩码估计方法相比，基于hmm的掩码估计方法平均字错误率降低了69.14%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 Hands-Free Speech Communication and Microphone Arrays

自引率

0.00%

发文量