Speech Enhancement Using Augmented SSL CycleGAN

2022 30th European Signal Processing Conference (EUSIPCO) Pub Date : 2022-08-29 DOI:10.23919/eusipco55093.2022.9909754

B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic

{"title":"Speech Enhancement Using Augmented SSL CycleGAN","authors":"B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic","doi":"10.23919/eusipco55093.2022.9909754","DOIUrl":null,"url":null,"abstract":"The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"395 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.

查看原文本刊更多论文

使用增强SSL CycleGAN的语音增强

单通道语音增强的目的是减弱有噪声语音的噪声成分，以提高语音成分的可理解性和感知质量。其中一种方法使用深度神经网络，通过使用成对数据集最小化退化特征和干净特征之间的均方误差，将有噪声的语音特征转换为干净的语音。最近，提出了一种非配对数据集方法CycleGAN语音增强，在实际训练过程中没有监督的情况下获得了最先进的结果。此外，与干净的语音相比，通常只有少量的噪声语音数据是可访问的。因此，本文提出了一种增强型半监督CycleGAN语音增强算法，该算法中只有一小部分训练数据库包含实际的配对数据。因此，这可以防止在初始训练阶段对稀缺噪声语音域对应的鉴别器进行过拟合，并且还可以通过定期将由逆网络变换的干净语音样本添加到稀缺噪声语音域的鉴别器池中来增强鉴别器。与在降低噪声的语音域上操作的基线CycleGAN语音增强方法相比，使用所提出的增强半监督方法在几种标准度量的手段中获得了显着更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 30th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量