扩张型Wave-U-Net语音增强实验分析

2020 27th Conference of Open Innovations Association (FRUCT) Pub Date : 2020-09-01 DOI:10.23919/fruct49677.2020.9211072

Mohamed Nabih Ali, A. Brutti, D. Falavigna

{"title":"扩张型Wave-U-Net语音增强实验分析","authors":"Mohamed Nabih Ali, A. Brutti, D. Falavigna","doi":"10.23919/fruct49677.2020.9211072","DOIUrl":null,"url":null,"abstract":"Speech enhancement is a relevant component in many real-world applications such as hearing aid devices, mobile telecommunications, and healthcare applications. In this paper, we investigate on the Dilated Wave-U-Net model: a recently proposed end-to-end neural speech enhancement approach based on the Wave-U-Net architecture. We evaluate the performance of the model on two datasets: the public VCTK dataset, and a contaminated version of Librispeech dataset. In particular, we experiment on using alternative losses based on the MSE loss, L1 norm and on a combination of L1 and MSE losses. Results show that the Dilated Wave-U-Net architecture outperforms other state-of-the-art methods in terms of intelligibility and quality metrics on both datasets and that MSE loss is the most performing one.","PeriodicalId":149674,"journal":{"name":"2020 27th Conference of Open Innovations Association (FRUCT)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Speech Enhancement Using Dilated Wave-U-Net: an Experimental Analysis\",\"authors\":\"Mohamed Nabih Ali, A. Brutti, D. Falavigna\",\"doi\":\"10.23919/fruct49677.2020.9211072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech enhancement is a relevant component in many real-world applications such as hearing aid devices, mobile telecommunications, and healthcare applications. In this paper, we investigate on the Dilated Wave-U-Net model: a recently proposed end-to-end neural speech enhancement approach based on the Wave-U-Net architecture. We evaluate the performance of the model on two datasets: the public VCTK dataset, and a contaminated version of Librispeech dataset. In particular, we experiment on using alternative losses based on the MSE loss, L1 norm and on a combination of L1 and MSE losses. Results show that the Dilated Wave-U-Net architecture outperforms other state-of-the-art methods in terms of intelligibility and quality metrics on both datasets and that MSE loss is the most performing one.\",\"PeriodicalId\":149674,\"journal\":{\"name\":\"2020 27th Conference of Open Innovations Association (FRUCT)\",\"volume\":\"154 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 27th Conference of Open Innovations Association (FRUCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/fruct49677.2020.9211072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 27th Conference of Open Innovations Association (FRUCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/fruct49677.2020.9211072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

语音增强是许多实际应用(如助听器设备、移动电信和医疗保健应用)中的相关组件。本文研究了最近提出的基于Wave-U-Net架构的端到端神经语音增强方法——扩展Wave-U-Net模型。我们在两个数据集上评估了模型的性能:公共VCTK数据集和librisspeech数据集的污染版本。特别是，我们实验了基于MSE损失、L1范数以及L1和MSE损失的组合使用替代损失。结果表明，在两个数据集的可理解性和质量指标方面，Dilated Wave-U-Net架构优于其他最先进的方法，并且MSE损失是性能最好的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech Enhancement Using Dilated Wave-U-Net: an Experimental Analysis

Speech enhancement is a relevant component in many real-world applications such as hearing aid devices, mobile telecommunications, and healthcare applications. In this paper, we investigate on the Dilated Wave-U-Net model: a recently proposed end-to-end neural speech enhancement approach based on the Wave-U-Net architecture. We evaluate the performance of the model on two datasets: the public VCTK dataset, and a contaminated version of Librispeech dataset. In particular, we experiment on using alternative losses based on the MSE loss, L1 norm and on a combination of L1 and MSE losses. Results show that the Dilated Wave-U-Net architecture outperforms other state-of-the-art methods in terms of intelligibility and quality metrics on both datasets and that MSE loss is the most performing one.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 27th Conference of Open Innovations Association (FRUCT)

自引率

0.00%

发文量