USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering

IF 4.1 2区 计算机科学 Q1 ACOUSTICS
Zhong-Qiu Wang
{"title":"USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering","authors":"Zhong-Qiu Wang","doi":"10.1109/TASLP.2024.3445120","DOIUrl":null,"url":null,"abstract":"In reverberant conditions with a single speaker, each far-field microphone records a reverberant version of the same speaker signal at a different location. In over-determined conditions, where there are multiple microphones but only one speaker, each recorded mixture signal can be leveraged as a constraint to narrow down the solutions to target anechoic speech and thereby reduce reverberation. Equipped with this insight, we propose USDnet, a novel deep neural network (DNN) approach for unsupervised speech dereverberation (USD). At each training step, we first feed an input mixture to USDnet to produce an estimate for target speech, and then linearly filter the DNN estimate to approximate the multi-microphone mixture so that the constraint can be satisfied at each microphone, thereby regularizing the DNN estimate to approximate target anechoic speech. The linear filter can be estimated based on the mixture and DNN estimate via neural forward filtering algorithms such as forward convolutive prediction. We show that this novel methodology can promote unsupervised dereverberation of single-source reverberant speech.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3882-3895"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10638210/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

In reverberant conditions with a single speaker, each far-field microphone records a reverberant version of the same speaker signal at a different location. In over-determined conditions, where there are multiple microphones but only one speaker, each recorded mixture signal can be leveraged as a constraint to narrow down the solutions to target anechoic speech and thereby reduce reverberation. Equipped with this insight, we propose USDnet, a novel deep neural network (DNN) approach for unsupervised speech dereverberation (USD). At each training step, we first feed an input mixture to USDnet to produce an estimate for target speech, and then linearly filter the DNN estimate to approximate the multi-microphone mixture so that the constraint can be satisfied at each microphone, thereby regularizing the DNN estimate to approximate target anechoic speech. The linear filter can be estimated based on the mixture and DNN estimate via neural forward filtering algorithms such as forward convolutive prediction. We show that this novel methodology can promote unsupervised dereverberation of single-source reverberant speech.
USDnet:通过神经前向滤波实现无监督语音消除混响
在单个扬声器的混响条件下,每个远场麦克风都会在不同位置记录同一扬声器信号的混响版本。在过度确定的条件下,即有多个麦克风但只有一个扬声器时,每个记录的混合信号都可以作为约束条件加以利用,以缩小针对消声语音的解决方案的范围,从而减少混响。有鉴于此,我们提出了一种用于无监督语音消除混响(USD)的新型深度神经网络(DNN)方法--USDnet。在每个训练步骤中,我们首先向 USDnet 输入一个输入混合物,以生成目标语音的估计值,然后对 DNN 估计值进行线性过滤,以近似多麦克风混合物,从而使每个麦克风都能满足约束条件,从而使 DNN 估计值正则化,以近似目标消混响语音。线性滤波器可通过神经前向滤波算法(如前向卷积预测)根据混合物和 DNN 估计值进行估计。我们的研究表明,这种新方法可以促进单源混响语音的无监督消除混响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE/ACM Transactions on Audio, Speech, and Language Processing
IEEE/ACM Transactions on Audio, Speech, and Language Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
11.30
自引率
11.10%
发文量
217
期刊介绍: The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信