Maximum kurtosis beamforming with a subspace filter for distant speech recognition

2011 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2011-12-01 DOI:10.1109/ASRU.2011.6163927

K. Kumatani, J. McDonough, B. Raj

{"title":"Maximum kurtosis beamforming with a subspace filter for distant speech recognition","authors":"K. Kumatani, J. McDonough, B. Raj","doi":"10.1109/ASRU.2011.6163927","DOIUrl":null,"url":null,"abstract":"This paper presents a new beamforming method for distant speech recognition (DSR). The dominant mode subspace is considered in order to efficiently estimate the active weight vectors for maximum kurtosis (MK) beamforming with the generalized sidelobe canceler (GSC). We demonstrated in [1], [2], [3] that the beamforming method based on the maximum kurtosis criterion can remove reverberant and noise effects without signal cancellation encountered in the conventional beamforming algorithms. The MK beamforming algorithm, however, required a relatively large amount of data for reliably estimating the active weight vector because it relies on a numerical optimization algorithm. In order to achieve efficient estimation, we propose to cascade the subspace (eigenspace) filter [4, §6.8] with the active weight vector. The subspace filter can decompose the output of the blocking matrix into directional signals and ambient noise components. Then, the ambient noise components are averaged and would be subtracted from the beamformer's output, which leads to reliable estimation as well as significant computational reduction. We show the effectiveness of our method through a set of distant speech recognition experiments on real microphone array data captured in the real environment. Our new beamforming algorithm provided the best recognition performance among conventional beamforming techniques, a word error rate (WER) of 5.3 %, which is comparable to the WER of 4.2 % obtained with a close-talking microphone. Moreover, it achieved better recognition performance with a fewer amounts of adaptation data than the conventional MK beamformer.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

This paper presents a new beamforming method for distant speech recognition (DSR). The dominant mode subspace is considered in order to efficiently estimate the active weight vectors for maximum kurtosis (MK) beamforming with the generalized sidelobe canceler (GSC). We demonstrated in [1], [2], [3] that the beamforming method based on the maximum kurtosis criterion can remove reverberant and noise effects without signal cancellation encountered in the conventional beamforming algorithms. The MK beamforming algorithm, however, required a relatively large amount of data for reliably estimating the active weight vector because it relies on a numerical optimization algorithm. In order to achieve efficient estimation, we propose to cascade the subspace (eigenspace) filter [4, §6.8] with the active weight vector. The subspace filter can decompose the output of the blocking matrix into directional signals and ambient noise components. Then, the ambient noise components are averaged and would be subtracted from the beamformer's output, which leads to reliable estimation as well as significant computational reduction. We show the effectiveness of our method through a set of distant speech recognition experiments on real microphone array data captured in the real environment. Our new beamforming algorithm provided the best recognition performance among conventional beamforming techniques, a word error rate (WER) of 5.3 %, which is comparable to the WER of 4.2 % obtained with a close-talking microphone. Moreover, it achieved better recognition performance with a fewer amounts of adaptation data than the conventional MK beamformer.

查看原文本刊更多论文

远距离语音识别的子空间滤波器最大峰度波束形成

提出了一种新的远距离语音识别波束形成方法。为了利用广义旁瓣对消器(GSC)有效估计最大峰度波束形成的有效权向量，考虑了主模子空间。我们在[1]，[2]，[3]中证明了基于最大峰度准则的波束形成方法可以消除混响和噪声影响，而不会遇到传统波束形成算法中的信号抵消问题。然而，MK波束形成算法依赖于数值优化算法，需要相对大量的数据来可靠地估计有效权向量。为了实现有效的估计，我们提出将子空间(特征空间)滤波器[4，§6.8]与主动权向量级联。子空间滤波器可以将阻塞矩阵的输出分解为方向信号和环境噪声分量。然后，将环境噪声分量平均并从波束形成器的输出中减去，从而得到可靠的估计并显著减少计算量。我们通过一组在真实环境中捕获的真实麦克风阵列数据的远程语音识别实验证明了该方法的有效性。我们的新波束形成算法在传统的波束形成技术中提供了最好的识别性能，单词错误率(WER)为5.3%，与近距离说话麦克风获得的4.2%的错误率相当。此外，与传统的MK波束形成器相比，该方法在自适应数据量较少的情况下取得了更好的识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量