Applying compensation techniques on i-vectors extracted from short-test utterances for speaker verification using deep neural network

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2017-03-08 DOI:10.1109/ICASSP.2017.7953206

Il-Ho Yang, Hee-Soo Heo, Sung-Hyun Yoon, Ha-jin Yu

{"title":"Applying compensation techniques on i-vectors extracted from short-test utterances for speaker verification using deep neural network","authors":"Il-Ho Yang, Hee-Soo Heo, Sung-Hyun Yoon, Ha-jin Yu","doi":"10.1109/ICASSP.2017.7953206","DOIUrl":null,"url":null,"abstract":"We propose a method to improve speaker verification performance when a test utterance is very short. In some situations with short test utterances, performance of ivector/probabilistic linear discriminant analysis systems degrades. The proposed method transforms short-utterance feature vectors to adequate vectors using a deep neural network, which compensate for short utterances. To reduce the dimensionality of the search space, we extract several principal components from the residual vectors between every long utterance i-vector in a development set and its truncated short utterance i-vector. Then an input i-vector of the network is transformed by linear combination of these directions. In this case, network outputs correspond to weights for linear combination of principal components. We use public speech databases to evaluate the method. The experimental results on short2-10sec condition (det6, male portion) of the NIST 2008 speaker recognition evaluation corpus show that the proposed method reduces the minimum detection cost relative to the baseline system, which uses linear discriminant analysis transformed i-vectors as features.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2017.7953206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

We propose a method to improve speaker verification performance when a test utterance is very short. In some situations with short test utterances, performance of ivector/probabilistic linear discriminant analysis systems degrades. The proposed method transforms short-utterance feature vectors to adequate vectors using a deep neural network, which compensate for short utterances. To reduce the dimensionality of the search space, we extract several principal components from the residual vectors between every long utterance i-vector in a development set and its truncated short utterance i-vector. Then an input i-vector of the network is transformed by linear combination of these directions. In this case, network outputs correspond to weights for linear combination of principal components. We use public speech databases to evaluate the method. The experimental results on short2-10sec condition (det6, male portion) of the NIST 2008 speaker recognition evaluation corpus show that the proposed method reduces the minimum detection cost relative to the baseline system, which uses linear discriminant analysis transformed i-vectors as features.

查看原文本刊更多论文

利用深度神经网络对短测话语提取的i向量进行补偿技术，用于说话人验证

我们提出了一种在测试话语很短时提高说话人验证性能的方法。在一些测试话语较短的情况下，向量/概率线性判别分析系统的性能会下降。该方法利用深度神经网络将短话语特征向量转换为适当的特征向量，对短话语进行补偿。为了降低搜索空间的维数，我们从开发集中的每个长话语i向量与其截断的短话语i向量之间的残差向量中提取几个主成分。然后通过这些方向的线性组合变换网络的输入i向量。在这种情况下，网络输出对应于主成分线性组合的权重。我们使用公共演讲数据库来评估该方法。在NIST 2008说话人识别评价语料库短2-10秒条件下(det6，男性部分)的实验结果表明，该方法相对于以线性判别分析变换的i向量为特征的基线系统降低了最小的检测成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量