A robust DBN-vector based speaker verification system under channel mismatch conditions

2016 IEEE International Conference on Digital Signal Processing (DSP) Pub Date : 2016-10-01 DOI:10.1109/ICDSP.2016.7868523

Disong Wang, Yuexian Zou, J. Liu, Y. Huang

{"title":"A robust DBN-vector based speaker verification system under channel mismatch conditions","authors":"Disong Wang, Yuexian Zou, J. Liu, Y. Huang","doi":"10.1109/ICDSP.2016.7868523","DOIUrl":null,"url":null,"abstract":"Channel variability is one of the largest challenges for speaker verification (SV) techniques. Techniques in the feature, model and score domains have been applied to mitigate the channel impact. In this paper, we strive to study on robust deep feature learning with the deep belief network (DBN) by using traditional spectral features such as MFCC or PLP. In detail, during the training phase, a DBN is trained to map spectral features to the corresponding speaker identity, then deep features extracted at kth hidden layers are selected where k is determined by maximizing the ratio between within-class distance and between-class distance. In the enrollment phase, the well-trained DBN is used to extract deep features at kth hidden layers, then kth-DBN-vector is formed by averaging these features. In the test phase, kth-DBN-vector is extracted for test utterance and compared to the enrolled kth-DBN-vector to make a verification decision. To validate the effectiveness of the learned DBN-vectors for speaker verification, extensive experiments have been purposely conducted on Mandarin corpuses. It is encouraged to see that our proposed DBN-vector based SV system is superior to the state-of-the-art i-vector based SV system under channel mismatch conditions in terms of equal error rate (EER) and minimum detection cost function (minDCF).","PeriodicalId":206199,"journal":{"name":"2016 IEEE International Conference on Digital Signal Processing (DSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Digital Signal Processing (DSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2016.7868523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Channel variability is one of the largest challenges for speaker verification (SV) techniques. Techniques in the feature, model and score domains have been applied to mitigate the channel impact. In this paper, we strive to study on robust deep feature learning with the deep belief network (DBN) by using traditional spectral features such as MFCC or PLP. In detail, during the training phase, a DBN is trained to map spectral features to the corresponding speaker identity, then deep features extracted at kth hidden layers are selected where k is determined by maximizing the ratio between within-class distance and between-class distance. In the enrollment phase, the well-trained DBN is used to extract deep features at kth hidden layers, then kth-DBN-vector is formed by averaging these features. In the test phase, kth-DBN-vector is extracted for test utterance and compared to the enrolled kth-DBN-vector to make a verification decision. To validate the effectiveness of the learned DBN-vectors for speaker verification, extensive experiments have been purposely conducted on Mandarin corpuses. It is encouraged to see that our proposed DBN-vector based SV system is superior to the state-of-the-art i-vector based SV system under channel mismatch conditions in terms of equal error rate (EER) and minimum detection cost function (minDCF).

查看原文本刊更多论文

信道失配条件下基于dbn向量的鲁棒说话人验证系统

通道可变性是说话人验证(SV)技术面临的最大挑战之一。特征、模型和分数领域的技术已被应用于减轻渠道影响。在本文中，我们尝试利用传统的谱特征如MFCC或PLP来研究深度信念网络(DBN)的鲁棒深度特征学习。具体而言，在训练阶段，训练DBN将频谱特征映射到相应的说话人身份，然后选择在第k个隐藏层提取的深度特征，其中k通过最大化类内距离与类间距离之比来确定。在登记阶段，使用训练好的DBN提取第k个隐藏层的深层特征，然后将这些特征平均形成第k -DBN向量。在测试阶段，提取第k - dbn向量用于测试话语，并与登记的第k - dbn向量进行比较，从而做出验证决策。为了验证学习到的dbn向量用于说话人验证的有效性，我们特意在普通话语料库上进行了大量的实验。令人鼓舞的是，在信道不匹配条件下，我们提出的基于dbn向量的SV系统在等错误率(EER)和最小检测成本函数(minDCF)方面优于最先进的基于i向量的SV系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE International Conference on Digital Signal Processing (DSP)

自引率

0.00%

发文量