Qing Wang, Wei Rao, Sining Sun, Lei Xie, Chng Eng Siong, Haizhou Li
{"title":"基于域对抗训练的无监督域自适应说话人识别","authors":"Qing Wang, Wei Rao, Sining Sun, Lei Xie, Chng Eng Siong, Haizhou Li","doi":"10.1109/ICASSP.2018.8461423","DOIUrl":null,"url":null,"abstract":"The i-vector approach to speaker recognition has achieved good performance when the domain of the evaluation dataset is similar to that of the training dataset. However, in realworld applications, there is always a mismatch between the training and evaluation datasets, that leads to performance degradation. To address this problem, this paper proposes to learn the domain-invariant and speaker-discriminative speech representations via domain adversarial training. Specifically, with domain adversarial training method, we use a gradient reversal layer to remove the domain variation and project the different domain data into the same subspace. Moreover, we compare the proposed method with other state-of-the-art unsupervised domain adaptation techniques for i-vector approach to speaker recognition (e.g. autoencoder based domain adaptation, inter dataset variability compensation, dataset-invariant covariance normalization, and so on). Experiments on 2013 domain adaptation challenge (DAC) dataset demonstrate that the proposed method is not only effective in solving the dataset mismatch problem, but also outperforms the compared unsupervised domain adaptation methods.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1993 1","pages":"4889-4893"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"115","resultStr":"{\"title\":\"Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition\",\"authors\":\"Qing Wang, Wei Rao, Sining Sun, Lei Xie, Chng Eng Siong, Haizhou Li\",\"doi\":\"10.1109/ICASSP.2018.8461423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The i-vector approach to speaker recognition has achieved good performance when the domain of the evaluation dataset is similar to that of the training dataset. However, in realworld applications, there is always a mismatch between the training and evaluation datasets, that leads to performance degradation. To address this problem, this paper proposes to learn the domain-invariant and speaker-discriminative speech representations via domain adversarial training. Specifically, with domain adversarial training method, we use a gradient reversal layer to remove the domain variation and project the different domain data into the same subspace. Moreover, we compare the proposed method with other state-of-the-art unsupervised domain adaptation techniques for i-vector approach to speaker recognition (e.g. autoencoder based domain adaptation, inter dataset variability compensation, dataset-invariant covariance normalization, and so on). Experiments on 2013 domain adaptation challenge (DAC) dataset demonstrate that the proposed method is not only effective in solving the dataset mismatch problem, but also outperforms the compared unsupervised domain adaptation methods.\",\"PeriodicalId\":6638,\"journal\":{\"name\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"1993 1\",\"pages\":\"4889-4893\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"115\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2018.8461423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2018.8461423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition
The i-vector approach to speaker recognition has achieved good performance when the domain of the evaluation dataset is similar to that of the training dataset. However, in realworld applications, there is always a mismatch between the training and evaluation datasets, that leads to performance degradation. To address this problem, this paper proposes to learn the domain-invariant and speaker-discriminative speech representations via domain adversarial training. Specifically, with domain adversarial training method, we use a gradient reversal layer to remove the domain variation and project the different domain data into the same subspace. Moreover, we compare the proposed method with other state-of-the-art unsupervised domain adaptation techniques for i-vector approach to speaker recognition (e.g. autoencoder based domain adaptation, inter dataset variability compensation, dataset-invariant covariance normalization, and so on). Experiments on 2013 domain adaptation challenge (DAC) dataset demonstrate that the proposed method is not only effective in solving the dataset mismatch problem, but also outperforms the compared unsupervised domain adaptation methods.