说话人验证的前端因子分析

2018 International Conference on Communications (COMM) Pub Date : 2018-06-01 DOI:10.1109/ICCOMM.2018.8453731

Florin Curelaru

{"title":"说话人验证的前端因子分析","authors":"Florin Curelaru","doi":"10.1109/ICCOMM.2018.8453731","DOIUrl":null,"url":null,"abstract":"It is known that the performance of the i-vectors/PLDA based speaker verification systems is affected in the cases of short utterances and limited training data. The performance degradation appears because the shorter the utterance, the less reliable the extracted i-vector is, and because the total variability covariance matrix and the underlying PLDA matrices need a significant amount of data to be robustly estimated. Considering the “MIT Mobile Device Speaker Verification Corpus” (MIT-MDSVC) as a representative dataset for robust speaker verification tasks on limited amount of training data, this paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification. The i-vectors/PLDA based system achieved good performance only when the total variability matrix and the underlying PLDA matrices were trained with data belonging to the enrolled speakers. This way of training means that the system should be fully retrained when new enrolled speakers were added. The performance of the system was more sensitive to the amount of training data of the underlying PLDA matrices than to the amount of training data of the total variability matrix. Overall, the Equal Error Rate performance of the i-vectors/PLDA based system was around 1% below the performance of a GMM-UBM system on the chosen dataset. The paper presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.","PeriodicalId":158890,"journal":{"name":"2018 International Conference on Communications (COMM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1262","resultStr":"{\"title\":\"Front-End Factor Analysis For Speaker Verification\",\"authors\":\"Florin Curelaru\",\"doi\":\"10.1109/ICCOMM.2018.8453731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is known that the performance of the i-vectors/PLDA based speaker verification systems is affected in the cases of short utterances and limited training data. The performance degradation appears because the shorter the utterance, the less reliable the extracted i-vector is, and because the total variability covariance matrix and the underlying PLDA matrices need a significant amount of data to be robustly estimated. Considering the “MIT Mobile Device Speaker Verification Corpus” (MIT-MDSVC) as a representative dataset for robust speaker verification tasks on limited amount of training data, this paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification. The i-vectors/PLDA based system achieved good performance only when the total variability matrix and the underlying PLDA matrices were trained with data belonging to the enrolled speakers. This way of training means that the system should be fully retrained when new enrolled speakers were added. The performance of the system was more sensitive to the amount of training data of the underlying PLDA matrices than to the amount of training data of the total variability matrix. Overall, the Equal Error Rate performance of the i-vectors/PLDA based system was around 1% below the performance of a GMM-UBM system on the chosen dataset. The paper presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.\",\"PeriodicalId\":158890,\"journal\":{\"name\":\"2018 International Conference on Communications (COMM)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1262\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Communications (COMM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCOMM.2018.8453731\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Communications (COMM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCOMM.2018.8453731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1262

摘要

在短话语和有限训练数据的情况下，基于i-vectors/PLDA的说话人验证系统的性能会受到影响。出现性能下降的原因是，语音越短，提取的i向量就越不可靠，而且总可变性协方差矩阵和底层PLDA矩阵需要大量的数据来进行稳健估计。考虑到“MIT移动设备说话人验证语料库”(MIT- mdsvc)作为在有限数量的训练数据上鲁棒说话人验证任务的代表性数据集，本文研究了哪种配置和哪些参数导致基于i-vectors/PLDA的说话人验证的最佳性能。基于i-vectors/PLDA的系统只有在总变异性矩阵和底层PLDA矩阵被注册说话人的数据训练时才能获得良好的性能。这种培训方式意味着，在增加新的登记发言者时，该系统应得到充分的再培训。系统的性能对底层PLDA矩阵的训练数据量比对总变异性矩阵的训练数据量更敏感。总体而言，基于i-vectors/PLDA的系统的等错误率性能比GMM-UBM系统在所选数据集上的性能低1%左右。本文最后给出了一些初步实验，在MIT-MDSVC的基础上，使用CSTR VCTK语料中的话语来训练总变异协方差矩阵和底层PLDA矩阵。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Front-End Factor Analysis For Speaker Verification

It is known that the performance of the i-vectors/PLDA based speaker verification systems is affected in the cases of short utterances and limited training data. The performance degradation appears because the shorter the utterance, the less reliable the extracted i-vector is, and because the total variability covariance matrix and the underlying PLDA matrices need a significant amount of data to be robustly estimated. Considering the “MIT Mobile Device Speaker Verification Corpus” (MIT-MDSVC) as a representative dataset for robust speaker verification tasks on limited amount of training data, this paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification. The i-vectors/PLDA based system achieved good performance only when the total variability matrix and the underlying PLDA matrices were trained with data belonging to the enrolled speakers. This way of training means that the system should be fully retrained when new enrolled speakers were added. The performance of the system was more sensitive to the amount of training data of the underlying PLDA matrices than to the amount of training data of the total variability matrix. Overall, the Equal Error Rate performance of the i-vectors/PLDA based system was around 1% below the performance of a GMM-UBM system on the chosen dataset. The paper presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on Communications (COMM)

自引率

0.00%

发文量