Integrating Data Priors to Weighted Prediction Error for Speech Dereverberation

IF 5.1 2区计算机科学 Q1 ACOUSTICS

IEEE/ACM Transactions on Audio, Speech, and Language Processing Pub Date : 2024-08-19 DOI:10.1109/TASLP.2024.3440003

Ziye Yang;Wenxing Yang;Kai Xie;Jie Chen

引用次数: 0

Abstract

Speech dereverberation aims to alleviate the detrimental effects of late-reverberant components. While the weighted prediction error (WPE) method has shown superior performance in dereverberation, there is still room for further improvement in terms of performance and robustness in complex and noisy environments. Recent research has highlighted the effectiveness of integrating physics-based and data-driven methods, enhancing the performance of various signal processing tasks while maintaining interpretability. Motivated by these advancements, this paper presents a novel dereverberation framework for the single-source case, which incorporates data-driven methods for capturing speech priors within the WPE framework. The plug-and-play (PnP) framework, specifically the regularization by denoising (RED) strategy, is utilized to incorporate speech prior information learnt from data during the optimization problem solving iterations. Experimental results validate the effectiveness of the proposed approach.

查看原文本刊更多论文

将数据先验因素与语音消除混响的加权预测误差相结合

语音消除混响的目的是减轻后期混响成分的有害影响。虽然加权预测误差（WPE）方法在消除混响方面表现出了卓越的性能，但在复杂和嘈杂环境中的性能和鲁棒性方面仍有进一步改进的空间。最近的研究突显了基于物理和数据驱动的方法整合的有效性，在保持可解释性的同时提高了各种信号处理任务的性能。在这些研究进展的推动下，本文针对单源情况提出了一种新的消除混响框架，该框架在 WPE 框架内采用了数据驱动方法来捕捉语音先验。即插即用（PnP）框架，特别是去噪正则化（RED）策略，被用来在优化问题迭代求解过程中纳入从数据中学到的语音先验信息。实验结果验证了所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

11.30

自引率

11.10%

发文量

217

期刊介绍： The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.