基于循环gan的低资源域自适应说话人识别

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-10-25 DOI:10.1109/ASRU46091.2019.9003748

P. S. Nidadavolu, Saurabh Kataria, J. Villalba, N. Dehak

{"title":"基于循环gan的低资源域自适应说话人识别","authors":"P. S. Nidadavolu, Saurabh Kataria, J. Villalba, N. Dehak","doi":"10.1109/ASRU46091.2019.9003748","DOIUrl":null,"url":null,"abstract":"Current speaker recognition technology provides great performance with the x-vector approach. However, performance decreases when the evaluation domain is different from the training domain, an issue usually addressed with domain adaptation approaches. Recently, unsupervised domain adaptation using cycle-consistent Generative Adversarial Networks (CycleGAN) has received a lot of attention. Cycle-GAN learn mappings between features of two domains given non-parallel data. We investigate their effectiveness in low resource scenario i.e. when limited amount of target domain data is available for adaptation, a case unexplored in previous works. We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance. Number of speakers present in source and target domains are 7000 and 191 respectively. By adding noise to the target domain during CycleGAN training, we were able to achieve better performance compared to the adaptation system whose CycleGAN was trained on a larger target data. On reverberant to clean adaptation task, our models improved EER by 18.3% relative on VOiCES dataset compared to a system trained on clean data. They also slightly improved over the state-of-the-art Weighted Prediction Error (WPE) de-reverberation algorithm.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-Gans\",\"authors\":\"P. S. Nidadavolu, Saurabh Kataria, J. Villalba, N. Dehak\",\"doi\":\"10.1109/ASRU46091.2019.9003748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current speaker recognition technology provides great performance with the x-vector approach. However, performance decreases when the evaluation domain is different from the training domain, an issue usually addressed with domain adaptation approaches. Recently, unsupervised domain adaptation using cycle-consistent Generative Adversarial Networks (CycleGAN) has received a lot of attention. Cycle-GAN learn mappings between features of two domains given non-parallel data. We investigate their effectiveness in low resource scenario i.e. when limited amount of target domain data is available for adaptation, a case unexplored in previous works. We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance. Number of speakers present in source and target domains are 7000 and 191 respectively. By adding noise to the target domain during CycleGAN training, we were able to achieve better performance compared to the adaptation system whose CycleGAN was trained on a larger target data. On reverberant to clean adaptation task, our models improved EER by 18.3% relative on VOiCES dataset compared to a system trained on clean data. They also slightly improved over the state-of-the-art Weighted Prediction Error (WPE) de-reverberation algorithm.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

当前的说话人识别技术采用的是x向量方法，具有很好的性能。然而，当评估领域与训练领域不同时，性能会下降，这个问题通常通过领域适应方法来解决。近年来，基于循环一致生成对抗网络(CycleGAN)的无监督域自适应算法受到了广泛关注。循环gan学习给定非并行数据的两个域特征之间的映射。我们研究了它们在低资源情况下的有效性，即当有限数量的目标领域数据可用于适应时，这是以前工作中未探索的情况。我们实验了两个自适应任务:麦克风到电话和一种新的混响到清洁自适应，最终目标是提高说话人识别性能。源域和目标域中分别有7000人和191人。通过在CycleGAN训练过程中在目标域中加入噪声，与在更大的目标数据上训练CycleGAN的自适应系统相比，我们能够获得更好的性能。在混响到清洁的适应任务中，我们的模型相对于在清洁数据上训练的系统，在voice数据集上提高了18.3%的EER。他们还稍微改进了最先进的加权预测误差(WPE)去混响算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-Gans

Current speaker recognition technology provides great performance with the x-vector approach. However, performance decreases when the evaluation domain is different from the training domain, an issue usually addressed with domain adaptation approaches. Recently, unsupervised domain adaptation using cycle-consistent Generative Adversarial Networks (CycleGAN) has received a lot of attention. Cycle-GAN learn mappings between features of two domains given non-parallel data. We investigate their effectiveness in low resource scenario i.e. when limited amount of target domain data is available for adaptation, a case unexplored in previous works. We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance. Number of speakers present in source and target domains are 7000 and 191 respectively. By adding noise to the target domain during CycleGAN training, we were able to achieve better performance compared to the adaptation system whose CycleGAN was trained on a larger target data. On reverberant to clean adaptation task, our models improved EER by 18.3% relative on VOiCES dataset compared to a system trained on clean data. They also slightly improved over the state-of-the-art Weighted Prediction Error (WPE) de-reverberation algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量