Speech enhancement by iterating forward pass through U-net

Tomasz Grzywalski, S. Drgas
{"title":"Speech enhancement by iterating forward pass through U-net","authors":"Tomasz Grzywalski, S. Drgas","doi":"10.23919/spa50552.2020.9241307","DOIUrl":null,"url":null,"abstract":"In recent years speech enhancement has shown great progress that was driven mostly by using bigger and more sophisticated neural networks. In this work we investigate the possibility to use state-of-the-art speech enhancement neural network and modify it in such a way that will allow it to process the noisy signal multiple times. By doing so we expect, that with each iteration the enhancement will improve. Experiments conducted using the WSJ0, Noisex-92 and DCASE datasets show, that U-net with gated dilated convolutions is able to achieve better SI-SDR, STOI and PESQ after processing the noisy signal two times, with the improvement being consistent across all SNRs and tested noise types. This is achieved without any additional trainable parameters and no additional memory requirements compared to the baseline model.","PeriodicalId":157578,"journal":{"name":"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/spa50552.2020.9241307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In recent years speech enhancement has shown great progress that was driven mostly by using bigger and more sophisticated neural networks. In this work we investigate the possibility to use state-of-the-art speech enhancement neural network and modify it in such a way that will allow it to process the noisy signal multiple times. By doing so we expect, that with each iteration the enhancement will improve. Experiments conducted using the WSJ0, Noisex-92 and DCASE datasets show, that U-net with gated dilated convolutions is able to achieve better SI-SDR, STOI and PESQ after processing the noisy signal two times, with the improvement being consistent across all SNRs and tested noise types. This is achieved without any additional trainable parameters and no additional memory requirements compared to the baseline model.
通过U-net迭代前向传递的语音增强
近年来,语音增强取得了巨大的进步,这主要是由使用更大、更复杂的神经网络推动的。在这项工作中,我们研究了使用最先进的语音增强神经网络的可能性,并以这样一种方式对其进行修改,使其能够多次处理噪声信号。通过这样做,我们期望,随着每次迭代,增强将得到改进。使用WSJ0、Noisex-92和DCASE数据集进行的实验表明,门控扩展卷积的U-net在对噪声信号进行两次处理后,能够获得更好的SI-SDR、STOI和PESQ,并且在所有信噪比和测试噪声类型中都有一致的改进。与基线模型相比,不需要任何额外的可训练参数,也不需要额外的内存需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信