A Deep Proximal-Unfolding Method for Monaural Speech Dereverberation

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI:10.23919/APSIPAASC55919.2022.9979935

Meihuang Wang, Minmin Yuan, Andong Li, C. Zheng, Xiaodong Li

{"title":"A Deep Proximal-Unfolding Method for Monaural Speech Dereverberation","authors":"Meihuang Wang, Minmin Yuan, Andong Li, C. Zheng, Xiaodong Li","doi":"10.23919/APSIPAASC55919.2022.9979935","DOIUrl":null,"url":null,"abstract":"Speech is often distorted by reverberation in an enclosure when the microphone is placed far away from the speech source, reducing speech quality and intelligibility. Recent years have witnessed the development of deep neural networks, and many deep learning-based methods have been proposed for dereverberation. Most deep learning-based methods remove the reverberation by directly mapping the reverberant speech to target speech, which often lacks adequate interpretability, limiting the performance upper bound. This paper proposes a deep un-folding method with an interpretable network structure. First, the dereverberation problem was reformulated based on maximum posterior criterion, and an iterative optimization algorithm was then devised by using proximal operators. Second, we unfolded the iterative optimization algorithm into multi-stage deep neural network, where each stage corresponded to a specific operation of the iterative procedure. Experiments were conducted on the WSJO-SI84 corpus, and the results on both simulated and real RIRs showed that the proposed model outperformed previous models and achieved state-of-the-art performance in terms of PESQ, ESTOI and frequency-weighted segmental SNR.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9979935","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Speech is often distorted by reverberation in an enclosure when the microphone is placed far away from the speech source, reducing speech quality and intelligibility. Recent years have witnessed the development of deep neural networks, and many deep learning-based methods have been proposed for dereverberation. Most deep learning-based methods remove the reverberation by directly mapping the reverberant speech to target speech, which often lacks adequate interpretability, limiting the performance upper bound. This paper proposes a deep un-folding method with an interpretable network structure. First, the dereverberation problem was reformulated based on maximum posterior criterion, and an iterative optimization algorithm was then devised by using proximal operators. Second, we unfolded the iterative optimization algorithm into multi-stage deep neural network, where each stage corresponded to a specific operation of the iterative procedure. Experiments were conducted on the WSJO-SI84 corpus, and the results on both simulated and real RIRs showed that the proposed model outperformed previous models and achieved state-of-the-art performance in terms of PESQ, ESTOI and frequency-weighted segmental SNR.

查看原文本刊更多论文

单耳语音去噪的深层近端展开方法

当麦克风放置在远离声源的地方时，声音往往会被混响所扭曲，从而降低语音质量和清晰度。近年来，深度神经网络得到了发展，人们提出了许多基于深度学习的去噪方法。大多数基于深度学习的方法通过将混响语音直接映射到目标语音来消除混响，这往往缺乏足够的可解释性，限制了性能上限。提出了一种具有可解释网络结构的深度解折叠方法。首先，基于最大后验准则对去噪问题进行了重新表述，并利用近端算子设计了一种迭代优化算法。其次，我们将迭代优化算法展开为多阶段深度神经网络，每一阶段对应迭代过程的一个具体操作。在WSJO-SI84语料库上进行了实验，在模拟和真实rir上的结果表明，所提出的模型在PESQ、ESTOI和频率加权段信噪比方面都优于先前的模型，取得了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量