B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic
{"title":"使用增强SSL CycleGAN的语音增强","authors":"B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic","doi":"10.23919/eusipco55093.2022.9909754","DOIUrl":null,"url":null,"abstract":"The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"395 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech Enhancement Using Augmented SSL CycleGAN\",\"authors\":\"B. Popović, Lidija Krstanović, M. Janev, S. Suzic, Tijana V. Nosek, J. Galic\",\"doi\":\"10.23919/eusipco55093.2022.9909754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.\",\"PeriodicalId\":231263,\"journal\":{\"name\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"395 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eusipco55093.2022.9909754\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The purpose of a single-channel speech enhancement is to attenuate the noise component of noisy speech to increase the intelligibility and the perceived quality of the speech component. One such approach uses deep neural networks to transform noisy speech features into clean speech by minimizing the mean squared errors between the degraded and the clean features using paired datasets. Most recently, an unpaired datasets approach, CycleGAN speech enhancement, was proposed, obtaining state-of-the-art results, regardless there was no supervision during the actual training. Also, only a small amount of noisy speech data is usually accessible in comparison to clean speech. Therefore, in this paper, an augmented semi-supervised CycleGAN speech enhancement algorithm is proposed, where only a small percentage of the training database contains the actual paired data. This, as a consequence, prevents overfitting of the discriminator corresponding to the scarce noised speech domain during the initial training stages and also augments the discriminator by periodically adding clean speech samples transformed by the inverse network into the pool of the discriminator of the scarce noisy speech domain. Significantly better results in the means of several standard measures are obtained using the proposed augmented semi-supervised method in comparison to the baseline CycleGAN speech enhancement approach operating on a reduced noisy speech domain.