{"title":"Applying compensation techniques on i-vectors extracted from short-test utterances for speaker verification using deep neural network","authors":"Il-Ho Yang, Hee-Soo Heo, Sung-Hyun Yoon, Ha-jin Yu","doi":"10.1109/ICASSP.2017.7953206","DOIUrl":null,"url":null,"abstract":"We propose a method to improve speaker verification performance when a test utterance is very short. In some situations with short test utterances, performance of ivector/probabilistic linear discriminant analysis systems degrades. The proposed method transforms short-utterance feature vectors to adequate vectors using a deep neural network, which compensate for short utterances. To reduce the dimensionality of the search space, we extract several principal components from the residual vectors between every long utterance i-vector in a development set and its truncated short utterance i-vector. Then an input i-vector of the network is transformed by linear combination of these directions. In this case, network outputs correspond to weights for linear combination of principal components. We use public speech databases to evaluate the method. The experimental results on short2-10sec condition (det6, male portion) of the NIST 2008 speaker recognition evaluation corpus show that the proposed method reduces the minimum detection cost relative to the baseline system, which uses linear discriminant analysis transformed i-vectors as features.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2017.7953206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
We propose a method to improve speaker verification performance when a test utterance is very short. In some situations with short test utterances, performance of ivector/probabilistic linear discriminant analysis systems degrades. The proposed method transforms short-utterance feature vectors to adequate vectors using a deep neural network, which compensate for short utterances. To reduce the dimensionality of the search space, we extract several principal components from the residual vectors between every long utterance i-vector in a development set and its truncated short utterance i-vector. Then an input i-vector of the network is transformed by linear combination of these directions. In this case, network outputs correspond to weights for linear combination of principal components. We use public speech databases to evaluate the method. The experimental results on short2-10sec condition (det6, male portion) of the NIST 2008 speaker recognition evaluation corpus show that the proposed method reduces the minimum detection cost relative to the baseline system, which uses linear discriminant analysis transformed i-vectors as features.