Maximum kurtosis beamforming with a subspace filter for distant speech recognition

K. Kumatani, J. McDonough, B. Raj
{"title":"Maximum kurtosis beamforming with a subspace filter for distant speech recognition","authors":"K. Kumatani, J. McDonough, B. Raj","doi":"10.1109/ASRU.2011.6163927","DOIUrl":null,"url":null,"abstract":"This paper presents a new beamforming method for distant speech recognition (DSR). The dominant mode subspace is considered in order to efficiently estimate the active weight vectors for maximum kurtosis (MK) beamforming with the generalized sidelobe canceler (GSC). We demonstrated in [1], [2], [3] that the beamforming method based on the maximum kurtosis criterion can remove reverberant and noise effects without signal cancellation encountered in the conventional beamforming algorithms. The MK beamforming algorithm, however, required a relatively large amount of data for reliably estimating the active weight vector because it relies on a numerical optimization algorithm. In order to achieve efficient estimation, we propose to cascade the subspace (eigenspace) filter [4, §6.8] with the active weight vector. The subspace filter can decompose the output of the blocking matrix into directional signals and ambient noise components. Then, the ambient noise components are averaged and would be subtracted from the beamformer's output, which leads to reliable estimation as well as significant computational reduction. We show the effectiveness of our method through a set of distant speech recognition experiments on real microphone array data captured in the real environment. Our new beamforming algorithm provided the best recognition performance among conventional beamforming techniques, a word error rate (WER) of 5.3 %, which is comparable to the WER of 4.2 % obtained with a close-talking microphone. Moreover, it achieved better recognition performance with a fewer amounts of adaptation data than the conventional MK beamformer.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

This paper presents a new beamforming method for distant speech recognition (DSR). The dominant mode subspace is considered in order to efficiently estimate the active weight vectors for maximum kurtosis (MK) beamforming with the generalized sidelobe canceler (GSC). We demonstrated in [1], [2], [3] that the beamforming method based on the maximum kurtosis criterion can remove reverberant and noise effects without signal cancellation encountered in the conventional beamforming algorithms. The MK beamforming algorithm, however, required a relatively large amount of data for reliably estimating the active weight vector because it relies on a numerical optimization algorithm. In order to achieve efficient estimation, we propose to cascade the subspace (eigenspace) filter [4, §6.8] with the active weight vector. The subspace filter can decompose the output of the blocking matrix into directional signals and ambient noise components. Then, the ambient noise components are averaged and would be subtracted from the beamformer's output, which leads to reliable estimation as well as significant computational reduction. We show the effectiveness of our method through a set of distant speech recognition experiments on real microphone array data captured in the real environment. Our new beamforming algorithm provided the best recognition performance among conventional beamforming techniques, a word error rate (WER) of 5.3 %, which is comparable to the WER of 4.2 % obtained with a close-talking microphone. Moreover, it achieved better recognition performance with a fewer amounts of adaptation data than the conventional MK beamformer.
远距离语音识别的子空间滤波器最大峰度波束形成
提出了一种新的远距离语音识别波束形成方法。为了利用广义旁瓣对消器(GSC)有效估计最大峰度波束形成的有效权向量,考虑了主模子空间。我们在[1],[2],[3]中证明了基于最大峰度准则的波束形成方法可以消除混响和噪声影响,而不会遇到传统波束形成算法中的信号抵消问题。然而,MK波束形成算法依赖于数值优化算法,需要相对大量的数据来可靠地估计有效权向量。为了实现有效的估计,我们提出将子空间(特征空间)滤波器[4,§6.8]与主动权向量级联。子空间滤波器可以将阻塞矩阵的输出分解为方向信号和环境噪声分量。然后,将环境噪声分量平均并从波束形成器的输出中减去,从而得到可靠的估计并显著减少计算量。我们通过一组在真实环境中捕获的真实麦克风阵列数据的远程语音识别实验证明了该方法的有效性。我们的新波束形成算法在传统的波束形成技术中提供了最好的识别性能,单词错误率(WER)为5.3%,与近距离说话麦克风获得的4.2%的错误率相当。此外,与传统的MK波束形成器相比,该方法在自适应数据量较少的情况下取得了更好的识别性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信