数字提示说话人验证的分段DNN/i-vector方法

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI:10.1109/APSIPA.2017.8281992

Jie Yan, Lei Xie, Guangsen Wang, Zhonghua Fu

{"title":"数字提示说话人验证的分段DNN/i-vector方法","authors":"Jie Yan, Lei Xie, Guangsen Wang, Zhonghua Fu","doi":"10.1109/APSIPA.2017.8281992","DOIUrl":null,"url":null,"abstract":"DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A segmental DNN/i-vector approach for digit-prompted speaker verification\",\"authors\":\"Jie Yan, Lei Xie, Guangsen Wang, Zhonghua Fu\",\"doi\":\"10.1109/APSIPA.2017.8281992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.\",\"PeriodicalId\":142091,\"journal\":{\"name\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2017.8281992\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8281992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

深度神经网络/i向量在文本无关的说话人验证系统中取得了最先进的性能。对于这样的系统，在训练i向量提取器时，将UBM后验替换为DNN后验，以更好地建模语音空间。然而，DNN/i-vector系统在依赖文本的说话人验证系统上取得的成功有限，因为对此类应用很重要的词汇可变性在话语级i-vector中被抑制了。在本文中，我们提出了一种分段DNN/i-vector方法用于数字提示的说话人验证任务。具体来说，我们将话语分割成数字，并使用单独的DNN/i-vector系统对每个数字建模。通过对每个数字的可变性进行独立建模，我们可以更加关注每个数字的说话人特征。为了考虑DNN后验的不确定性，我们提出了一种基于置信度的加权方法。在RSR2015数据集上，该方法的错误率为3.44%，而基线话语级DNN/i-vector系统的错误率为5.76%，联合因子分析(JFA)系统的错误率为4.54%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A segmental DNN/i-vector approach for digit-prompted speaker verification

DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量