Xiaoxue Luo , Yuxuan Ke , Xiaodong Li , Chengshi Zheng
{"title":"Deep informed spatio-spectral filtering for multi-channel speech extraction against steering vector uncertainties","authors":"Xiaoxue Luo , Yuxuan Ke , Xiaodong Li , Chengshi Zheng","doi":"10.1016/j.apacoust.2024.110259","DOIUrl":null,"url":null,"abstract":"<div><p>Adaptive beamforming combined with post-filtering is one of the most widely used techniques in suppressing directional interference and environmental ambient noise, as well as reverberation. However, many adaptive beamforming methods are often relatively sensitive to the steering vector mismatch, and their performance degrades a lot for practical applications, although pioneer researchers have made great efforts on improving the robustness. To achieve better performance in challenging scenarios, this paper proposes a two-stage deep informed spatio-spectral filtering for multi-channel speech extraction, which removes interference, noise, and reverberation simultaneously when the steering vector error exists. In the first stage, a direction-informed dual-path beamforming network was introduced to extract the target directional speech with only its early reflections. To improve the robustness, an information rectification block was designed to compensate for the signal model mismatch, and the steering vector uncertainty was taken into account in the training phase. Besides, a dual-path beamforming module was adopted to reduce magnitude distortion and improve phase recovery simultaneously. In the second stage, a magnitude-phase fusion network was proposed, serving as the post-processing module to further fuse the magnitude and phase estimated by the first stage. Experimental results confirmed that the proposed method was more robust to the signal model mismatch and achieved better performance than other baseline methods in terms of speech quality and intelligibility.</p></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X24004109","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Adaptive beamforming combined with post-filtering is one of the most widely used techniques in suppressing directional interference and environmental ambient noise, as well as reverberation. However, many adaptive beamforming methods are often relatively sensitive to the steering vector mismatch, and their performance degrades a lot for practical applications, although pioneer researchers have made great efforts on improving the robustness. To achieve better performance in challenging scenarios, this paper proposes a two-stage deep informed spatio-spectral filtering for multi-channel speech extraction, which removes interference, noise, and reverberation simultaneously when the steering vector error exists. In the first stage, a direction-informed dual-path beamforming network was introduced to extract the target directional speech with only its early reflections. To improve the robustness, an information rectification block was designed to compensate for the signal model mismatch, and the steering vector uncertainty was taken into account in the training phase. Besides, a dual-path beamforming module was adopted to reduce magnitude distortion and improve phase recovery simultaneously. In the second stage, a magnitude-phase fusion network was proposed, serving as the post-processing module to further fuse the magnitude and phase estimated by the first stage. Experimental results confirmed that the proposed method was more robust to the signal model mismatch and achieved better performance than other baseline methods in terms of speech quality and intelligibility.
期刊介绍:
Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense.
Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems.
Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.