{"title":"Attention-Based Beamformer For Multi-Channel Speech Enhancement","authors":"Jinglin Bai, Hao Li, Xueliang Zhang, Fei Chen","doi":"arxiv-2409.06456","DOIUrl":null,"url":null,"abstract":"Minimum Variance Distortionless Response (MVDR) is a classical adaptive\nbeamformer that theoretically ensures the distortionless transmission of\nsignals in the target direction. Its performance in noise reduction actually\ndepends on the accuracy of the noise spatial covariance matrix (SCM) estimate.\nAlthough recent deep learning has shown remarkable performance in multi-channel\nspeech enhancement, the property of distortionless response still makes MVDR\nhighly popular in real applications. In this paper, we propose an\nattention-based mechanism to calculate the speech and noise SCM and then apply\nMVDR to obtain the enhanced speech. Moreover, a deep learning architecture\nusing the inplace convolution operator and frequency-independent LSTM has\nproven effective in facilitating SCM estimation. The model is optimized in an\nend-to-end manner. Experimental results indicate that the proposed method is\nextremely effective in tracking moving or stationary speakers under non-causal\nand causal conditions, outperforming other baselines. It is worth mentioning\nthat our model has only 0.35 million parameters, making it easy to be deployed\non edge devices.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"65 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Minimum Variance Distortionless Response (MVDR) is a classical adaptive
beamformer that theoretically ensures the distortionless transmission of
signals in the target direction. Its performance in noise reduction actually
depends on the accuracy of the noise spatial covariance matrix (SCM) estimate.
Although recent deep learning has shown remarkable performance in multi-channel
speech enhancement, the property of distortionless response still makes MVDR
highly popular in real applications. In this paper, we propose an
attention-based mechanism to calculate the speech and noise SCM and then apply
MVDR to obtain the enhanced speech. Moreover, a deep learning architecture
using the inplace convolution operator and frequency-independent LSTM has
proven effective in facilitating SCM estimation. The model is optimized in an
end-to-end manner. Experimental results indicate that the proposed method is
extremely effective in tracking moving or stationary speakers under non-causal
and causal conditions, outperforming other baselines. It is worth mentioning
that our model has only 0.35 million parameters, making it easy to be deployed
on edge devices.