基于对抗性网络的深度后悔损失语音增强

2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS) Pub Date : 2022-03-30 DOI:10.1109/NISS55057.2022.10085296

H. Pardede, Vicky Zilvan, A. Ramdan, A. R. Yuliani, Endang Suryawati, Renni Kusumowardani

{"title":"基于对抗性网络的深度后悔损失语音增强","authors":"H. Pardede, Vicky Zilvan, A. Ramdan, A. R. Yuliani, Endang Suryawati, Renni Kusumowardani","doi":"10.1109/NISS55057.2022.10085296","DOIUrl":null,"url":null,"abstract":"Speech enhancement is often applied for speech-based systems due to the proneness of speech signals to additive background noise. While speech processing-based methods are traditionally used for speech enhancement, with advancements in deep learning technologies, many efforts have been made to implement them for speech enhancement. Using deep learning, the networks learn mapping functions from noisy data to clean ones and then learn to reconstruct the clean speech signals. As a consequence, deep learning methods can reduce what is so-called musical noise that is often found in traditional speech enhancement methods. Currently, one popular deep learning architecture for speech enhancement is generative adversarial networks (GAN). However, the cross-entropy loss that is employed in GAN often causes the training to be unstable. So, in many implementations of GAN, the cross-entropy loss is replaced with the least-square loss. In this paper, to improve the training stability of GAN using cross-entropy loss, we propose to use deep regret analytic generative adversarial networks (Dragan) for speech enhancements. It is based on applying a gradient penalty on cross-entropy loss. We also employ relativistic rules to stabilize the training of GAN. Then, we applied it to the least square and Dragan losses. Our experiments suggest that the proposed method improve the quality of speech better than the least-square loss on several objective quality metrics.","PeriodicalId":138637,"journal":{"name":"2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adversarial Networks-Based Speech Enhancement with Deep Regret Loss\",\"authors\":\"H. Pardede, Vicky Zilvan, A. Ramdan, A. R. Yuliani, Endang Suryawati, Renni Kusumowardani\",\"doi\":\"10.1109/NISS55057.2022.10085296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech enhancement is often applied for speech-based systems due to the proneness of speech signals to additive background noise. While speech processing-based methods are traditionally used for speech enhancement, with advancements in deep learning technologies, many efforts have been made to implement them for speech enhancement. Using deep learning, the networks learn mapping functions from noisy data to clean ones and then learn to reconstruct the clean speech signals. As a consequence, deep learning methods can reduce what is so-called musical noise that is often found in traditional speech enhancement methods. Currently, one popular deep learning architecture for speech enhancement is generative adversarial networks (GAN). However, the cross-entropy loss that is employed in GAN often causes the training to be unstable. So, in many implementations of GAN, the cross-entropy loss is replaced with the least-square loss. In this paper, to improve the training stability of GAN using cross-entropy loss, we propose to use deep regret analytic generative adversarial networks (Dragan) for speech enhancements. It is based on applying a gradient penalty on cross-entropy loss. We also employ relativistic rules to stabilize the training of GAN. Then, we applied it to the least square and Dragan losses. Our experiments suggest that the proposed method improve the quality of speech better than the least-square loss on several objective quality metrics.\",\"PeriodicalId\":138637,\"journal\":{\"name\":\"2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)\",\"volume\":\"190 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NISS55057.2022.10085296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NISS55057.2022.10085296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于语音信号容易受到背景噪声的影响，语音增强通常应用于基于语音的系统。虽然基于语音处理的方法传统上用于语音增强，但随着深度学习技术的进步，已经做出了许多努力来实现语音增强。使用深度学习，网络学习从噪声数据到干净数据的映射函数，然后学习重建干净的语音信号。因此，深度学习方法可以减少传统语音增强方法中经常出现的所谓音乐噪音。目前，一种流行的用于语音增强的深度学习架构是生成对抗网络(GAN)。然而，GAN中使用的交叉熵损失往往会导致训练不稳定。因此，在GAN的许多实现中，交叉熵损失被最小二乘损失所取代。在本文中，为了利用交叉熵损失提高GAN的训练稳定性，我们提出使用深度遗憾分析生成对抗网络(Dragan)进行语音增强。它基于对交叉熵损失施加梯度惩罚。我们还使用相对论规则来稳定GAN的训练。然后，我们将其应用于最小二乘损失和Dragan损失。我们的实验表明，在几个客观质量指标上，所提出的方法比最小二乘损失更好地提高了语音质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adversarial Networks-Based Speech Enhancement with Deep Regret Loss

Speech enhancement is often applied for speech-based systems due to the proneness of speech signals to additive background noise. While speech processing-based methods are traditionally used for speech enhancement, with advancements in deep learning technologies, many efforts have been made to implement them for speech enhancement. Using deep learning, the networks learn mapping functions from noisy data to clean ones and then learn to reconstruct the clean speech signals. As a consequence, deep learning methods can reduce what is so-called musical noise that is often found in traditional speech enhancement methods. Currently, one popular deep learning architecture for speech enhancement is generative adversarial networks (GAN). However, the cross-entropy loss that is employed in GAN often causes the training to be unstable. So, in many implementations of GAN, the cross-entropy loss is replaced with the least-square loss. In this paper, to improve the training stability of GAN using cross-entropy loss, we propose to use deep regret analytic generative adversarial networks (Dragan) for speech enhancements. It is based on applying a gradient penalty on cross-entropy loss. We also employ relativistic rules to stabilize the training of GAN. Then, we applied it to the least square and Dragan losses. Our experiments suggest that the proposed method improve the quality of speech better than the least-square loss on several objective quality metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 5th International Conference on Networking, Information Systems and Security: Envisage Intelligent Systems in 5g//6G-based Interconnected Digital Worlds (NISS)

自引率

0.00%

发文量