Jie Cao, Yaofeng Zhou, Hong Yu, Xiaoxu Li, Dan Wang, Zhanyu Ma
{"title":"A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network","authors":"Jie Cao, Yaofeng Zhou, Hong Yu, Xiaoxu Li, Dan Wang, Zhanyu Ma","doi":"10.1109/APSIPAASC47483.2019.9023273","DOIUrl":null,"url":null,"abstract":"Speech enhancement based on generative adversarial networks (GANs) can overcome the problems of many classical speech enhancement methods, such as relying on the first-order statistics of signals and ignoring the phase mismatch between the noisy and the clean signals. However, GANs are hard to train and have the vanishing gradients problem which may lead to generate poor samples. In this paper, we propose a relativistic average least squares loss function with a mixed penalty term for speech enhancement generative adversarial network. The mixed penalty term can minimize the distance between generated and clean samples more effectively. Experimental results on Valentini 2016 and Valentini 2017 dataset show that the proposed loss can make the training of GAN more stable, and achieves good performance in both objective and subjective evaluation.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPAASC47483.2019.9023273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speech enhancement based on generative adversarial networks (GANs) can overcome the problems of many classical speech enhancement methods, such as relying on the first-order statistics of signals and ignoring the phase mismatch between the noisy and the clean signals. However, GANs are hard to train and have the vanishing gradients problem which may lead to generate poor samples. In this paper, we propose a relativistic average least squares loss function with a mixed penalty term for speech enhancement generative adversarial network. The mixed penalty term can minimize the distance between generated and clean samples more effectively. Experimental results on Valentini 2016 and Valentini 2017 dataset show that the proposed loss can make the training of GAN more stable, and achieves good performance in both objective and subjective evaluation.