A New Method for Improving Generative Adversarial Networks in Speech Enhancement

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2021-01-24 DOI:10.1109/ISCSLP49672.2021.9362057

Fan Yang, Junfeng Li, Yonghong Yan

{"title":"A New Method for Improving Generative Adversarial Networks in Speech Enhancement","authors":"Fan Yang, Junfeng Li, Yonghong Yan","doi":"10.1109/ISCSLP49672.2021.9362057","DOIUrl":null,"url":null,"abstract":"Recent advances in deep learning-based speech enhancement techniques have shown promising prospects over most traditional methods. Generative adversarial networks (GANs), as a recent breakthrough in deep learning, can effectively remove additive noise embedded in speech, improving the perceptual quality [1]. In the existing methods of using GANs to achieve speech enhancement, the discriminator often regards the clean speech signal as real data and the enhanced speech signal as fake data; however, this approach may cause feedback from the discriminator to fail to provide sufficient effective information for the generator to correct its output waveform. In this paper, we propose a new method to use GANs for speech enhancement. This method, by constructing a new learning target for the discriminator, allows the generator to obtain more valuable feed-back, generating more realistic speech signals. In addition, we introduce a new objective, which requires the generator to generate data that matches the statistics of the real data. Systematic evaluations and comparisons show that the proposed method yields better performance compared with state-of-art method-s, and achieves better generalization under challenging unseen noise and signal-to-noise ratio (SNR) environments.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in deep learning-based speech enhancement techniques have shown promising prospects over most traditional methods. Generative adversarial networks (GANs), as a recent breakthrough in deep learning, can effectively remove additive noise embedded in speech, improving the perceptual quality [1]. In the existing methods of using GANs to achieve speech enhancement, the discriminator often regards the clean speech signal as real data and the enhanced speech signal as fake data; however, this approach may cause feedback from the discriminator to fail to provide sufficient effective information for the generator to correct its output waveform. In this paper, we propose a new method to use GANs for speech enhancement. This method, by constructing a new learning target for the discriminator, allows the generator to obtain more valuable feed-back, generating more realistic speech signals. In addition, we introduce a new objective, which requires the generator to generate data that matches the statistics of the real data. Systematic evaluations and comparisons show that the proposed method yields better performance compared with state-of-art method-s, and achieves better generalization under challenging unseen noise and signal-to-noise ratio (SNR) environments.

查看原文本刊更多论文

语音增强中生成对抗网络改进的新方法

基于深度学习的语音增强技术的最新进展与大多数传统方法相比显示出了良好的前景。生成对抗网络(Generative adversarial networks, GANs)作为深度学习领域的最新突破，可以有效地去除语音中嵌入的加性噪声，提高感知质量[1]。在现有的利用gan实现语音增强的方法中，鉴别器往往将干净的语音信号作为真实数据，将增强后的语音信号作为假数据;然而，这种方法可能会导致鉴别器的反馈不能为发生器校正输出波形提供足够的有效信息。本文提出了一种利用gan进行语音增强的新方法。该方法通过为鉴别器构建新的学习目标，使生成器获得更有价值的反馈，生成更真实的语音信号。此外，我们引入了一个新的目标，它要求生成器生成与真实数据统计相匹配的数据。系统评估和比较表明，该方法与现有方法相比具有更好的性能，并且在具有挑战性的看不见噪声和信噪比(SNR)环境下具有更好的泛化效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)

自引率

0.00%

发文量