判别训练贝叶斯说话人比较的i向量

2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI:10.1109/ICASSP.2013.6639153

B. J. Borgstrom, A. McCree

{"title":"判别训练贝叶斯说话人比较的i向量","authors":"B. J. Borgstrom, A. McCree","doi":"10.1109/ICASSP.2013.6639153","DOIUrl":null,"url":null,"abstract":"This paper presents a framework for fully Bayesian speaker comparison of i-vectors. By generalizing the train/test paradigm, we derive an analytic expression for the speaker comparison log-likelihood ratio (LLR), as well as solutions for model training and Bayesian scoring. This framework is useful for enrollment sets of any size. For the specific case of single-cut enrollment, it is shown to be mathematically equivalent to probabilistic linear discriminant analysis (PLDA). Additionally, we present discriminative training of model hyper-parameters by minimizing the total cross entropy between LLRs and class labels. When applied to speaker recognition, significant performance gains are observed for various NIST SRE 2010 extended evaluation tasks.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Discriminatively trained Bayesian speaker comparison of i-vectors\",\"authors\":\"B. J. Borgstrom, A. McCree\",\"doi\":\"10.1109/ICASSP.2013.6639153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a framework for fully Bayesian speaker comparison of i-vectors. By generalizing the train/test paradigm, we derive an analytic expression for the speaker comparison log-likelihood ratio (LLR), as well as solutions for model training and Bayesian scoring. This framework is useful for enrollment sets of any size. For the specific case of single-cut enrollment, it is shown to be mathematically equivalent to probabilistic linear discriminant analysis (PLDA). Additionally, we present discriminative training of model hyper-parameters by minimizing the total cross entropy between LLRs and class labels. When applied to speaker recognition, significant performance gains are observed for various NIST SRE 2010 extended evaluation tasks.\",\"PeriodicalId\":183968,\"journal\":{\"name\":\"2013 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2013.6639153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2013.6639153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

本文提出了一个i向量的全贝叶斯说话人比较的框架。通过推广训练/测试范式，我们推导了说话人比较对数似然比(LLR)的解析表达式，以及模型训练和贝叶斯评分的解决方案。这个框架对于任何规模的注册集都很有用。对于单次招生的具体情况，它在数学上等同于概率线性判别分析(PLDA)。此外，我们通过最小化llr和类标签之间的总交叉熵，提出了模型超参数的判别训练。当应用于说话人识别时，在各种NIST SRE 2010扩展评估任务中观察到显著的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Discriminatively trained Bayesian speaker comparison of i-vectors

This paper presents a framework for fully Bayesian speaker comparison of i-vectors. By generalizing the train/test paradigm, we derive an analytic expression for the speaker comparison log-likelihood ratio (LLR), as well as solutions for model training and Bayesian scoring. This framework is useful for enrollment sets of any size. For the specific case of single-cut enrollment, it is shown to be mathematically equivalent to probabilistic linear discriminant analysis (PLDA). Additionally, we present discriminative training of model hyper-parameters by minimizing the total cross entropy between LLRs and class labels. When applied to speaker recognition, significant performance gains are observed for various NIST SRE 2010 extended evaluation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量