Meihuang Wang, Minmin Yuan, Andong Li, C. Zheng, Xiaodong Li
{"title":"A Deep Proximal-Unfolding Method for Monaural Speech Dereverberation","authors":"Meihuang Wang, Minmin Yuan, Andong Li, C. Zheng, Xiaodong Li","doi":"10.23919/APSIPAASC55919.2022.9979935","DOIUrl":null,"url":null,"abstract":"Speech is often distorted by reverberation in an enclosure when the microphone is placed far away from the speech source, reducing speech quality and intelligibility. Recent years have witnessed the development of deep neural networks, and many deep learning-based methods have been proposed for dereverberation. Most deep learning-based methods remove the reverberation by directly mapping the reverberant speech to target speech, which often lacks adequate interpretability, limiting the performance upper bound. This paper proposes a deep un-folding method with an interpretable network structure. First, the dereverberation problem was reformulated based on maximum posterior criterion, and an iterative optimization algorithm was then devised by using proximal operators. Second, we unfolded the iterative optimization algorithm into multi-stage deep neural network, where each stage corresponded to a specific operation of the iterative procedure. Experiments were conducted on the WSJO-SI84 corpus, and the results on both simulated and real RIRs showed that the proposed model outperformed previous models and achieved state-of-the-art performance in terms of PESQ, ESTOI and frequency-weighted segmental SNR.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9979935","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speech is often distorted by reverberation in an enclosure when the microphone is placed far away from the speech source, reducing speech quality and intelligibility. Recent years have witnessed the development of deep neural networks, and many deep learning-based methods have been proposed for dereverberation. Most deep learning-based methods remove the reverberation by directly mapping the reverberant speech to target speech, which often lacks adequate interpretability, limiting the performance upper bound. This paper proposes a deep un-folding method with an interpretable network structure. First, the dereverberation problem was reformulated based on maximum posterior criterion, and an iterative optimization algorithm was then devised by using proximal operators. Second, we unfolded the iterative optimization algorithm into multi-stage deep neural network, where each stage corresponded to a specific operation of the iterative procedure. Experiments were conducted on the WSJO-SI84 corpus, and the results on both simulated and real RIRs showed that the proposed model outperformed previous models and achieved state-of-the-art performance in terms of PESQ, ESTOI and frequency-weighted segmental SNR.