{"title":"A segmental DNN/i-vector approach for digit-prompted speaker verification","authors":"Jie Yan, Lei Xie, Guangsen Wang, Zhonghua Fu","doi":"10.1109/APSIPA.2017.8281992","DOIUrl":null,"url":null,"abstract":"DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8281992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
DNN/i-vectors have achieved state-of-the-art performance in text-independent speaker verification systems. For such systems, the UBM posteriors are replaced with the DNN posteriors when training the i-vector extractor to better model the phonetic space. However, the DNN/i-vector systems have limited success on text-dependent speaker verification systems as the lexical variabilities, which are important for such applications, are suppressed in the utterance-level i-vectors. In this paper, we propose a segmental DNN/i-vector approach for the digit-prompted speaker verification task. Specifically, we segment the utterance into digits and model each digit using an individual DNN/i-vector system. By modeling the variability for each digit independently, we can focus more on the speaker characteristics for each digit. To take into consideration the uncertainties in the DNN posteriors, we propose a confidence measure based weighting method. On the RSR2015 dataset, the proposed approach yields an equal error rate of 3.44%, compared to 5.76% of the baseline utterance-level DNN/i-vector system and 4.54% of the joint factor analysis (JFA) system.