Speech Enhancement Using Dilated Wave-U-Net: an Experimental Analysis

2020 27th Conference of Open Innovations Association (FRUCT) Pub Date : 2020-09-01 DOI:10.23919/fruct49677.2020.9211072

Mohamed Nabih Ali, A. Brutti, D. Falavigna

引用次数: 6

Abstract

Speech enhancement is a relevant component in many real-world applications such as hearing aid devices, mobile telecommunications, and healthcare applications. In this paper, we investigate on the Dilated Wave-U-Net model: a recently proposed end-to-end neural speech enhancement approach based on the Wave-U-Net architecture. We evaluate the performance of the model on two datasets: the public VCTK dataset, and a contaminated version of Librispeech dataset. In particular, we experiment on using alternative losses based on the MSE loss, L1 norm and on a combination of L1 and MSE losses. Results show that the Dilated Wave-U-Net architecture outperforms other state-of-the-art methods in terms of intelligibility and quality metrics on both datasets and that MSE loss is the most performing one.

查看原文本刊更多论文

扩张型Wave-U-Net语音增强实验分析

语音增强是许多实际应用(如助听器设备、移动电信和医疗保健应用)中的相关组件。本文研究了最近提出的基于Wave-U-Net架构的端到端神经语音增强方法——扩展Wave-U-Net模型。我们在两个数据集上评估了模型的性能:公共VCTK数据集和librisspeech数据集的污染版本。特别是，我们实验了基于MSE损失、L1范数以及L1和MSE损失的组合使用替代损失。结果表明，在两个数据集的可理解性和质量指标方面，Dilated Wave-U-Net架构优于其他最先进的方法，并且MSE损失是性能最好的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 27th Conference of Open Innovations Association (FRUCT)

自引率

0.00%

发文量