Bayesian Adversarial Learning for Speaker Recognition

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9004033

Jen-Tzung Chien, Chun Lin Kuo

{"title":"Bayesian Adversarial Learning for Speaker Recognition","authors":"Jen-Tzung Chien, Chun Lin Kuo","doi":"10.1109/ASRU46091.2019.9004033","DOIUrl":null,"url":null,"abstract":"This paper presents a new generative adversarial network (GAN) which artificially generates the i-vectors to compensate the imbalanced or insufficient data in speaker recognition based on the probabilistic linear discriminant analysis. Theoretically, GAN is powerful to generate the artificial data which are misclassified as the real data. However, GAN suffers from the mode collapse problem in two-player optimization over generator and discriminator. This study deals with this challenge by improving the model regularization through characterizing the weight uncertainty in GAN. A new Bayesian GAN is implemented to learn a regularized model from diverse data where the strong modes are flattened via the marginalization. In particular, we present a variational GAN (VGAN) where the encoder, generator and discriminator are jointly estimated according to the variational inference. The computation cost is significantly reduced. To assure the preservation of gradient values, the learning objective based on Wasserstein distance is further introduced. The issues of model collapse and gradient vanishing are alleviated. Experiments on NIST i-vector Speaker Recognition Challenge demonstrate the superiority of the proposed VGAN to the variational autoencoder, the standard GAN and the Bayesian GAN based on the sampling method. The learning efficiency and generation performance are evaluated.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9004033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This paper presents a new generative adversarial network (GAN) which artificially generates the i-vectors to compensate the imbalanced or insufficient data in speaker recognition based on the probabilistic linear discriminant analysis. Theoretically, GAN is powerful to generate the artificial data which are misclassified as the real data. However, GAN suffers from the mode collapse problem in two-player optimization over generator and discriminator. This study deals with this challenge by improving the model regularization through characterizing the weight uncertainty in GAN. A new Bayesian GAN is implemented to learn a regularized model from diverse data where the strong modes are flattened via the marginalization. In particular, we present a variational GAN (VGAN) where the encoder, generator and discriminator are jointly estimated according to the variational inference. The computation cost is significantly reduced. To assure the preservation of gradient values, the learning objective based on Wasserstein distance is further introduced. The issues of model collapse and gradient vanishing are alleviated. Experiments on NIST i-vector Speaker Recognition Challenge demonstrate the superiority of the proposed VGAN to the variational autoencoder, the standard GAN and the Bayesian GAN based on the sampling method. The learning efficiency and generation performance are evaluated.

查看原文本刊更多论文

基于贝叶斯对抗学习的说话人识别

本文提出了一种基于概率线性判别分析的生成对抗网络(GAN)，该网络人工生成i向量来补偿说话人识别中数据的不平衡或不足。从理论上讲，GAN在生成被误分类为真实数据的人工数据方面具有强大的功能。然而，GAN在基于生成器和鉴别器的双玩家优化中存在模式崩溃问题。本研究通过对GAN中权重不确定性的刻画来改善模型的正则化，从而解决了这一挑战。实现了一种新的贝叶斯GAN，从不同的数据中学习正则化模型，其中强模态通过边缘化被扁平化。特别地，我们提出了一种变分GAN (VGAN)，其中编码器，生成器和鉴别器是根据变分推理联合估计的。大大降低了计算成本。为了保证梯度值的保留，进一步引入了基于Wasserstein距离的学习目标。减轻了模型崩溃和梯度消失的问题。在NIST i-vector说话人识别挑战赛上的实验表明，本文提出的VGAN比变分自编码器、标准GAN和基于采样方法的贝叶斯GAN具有优越性。评估了算法的学习效率和生成性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量