{"title":"信道失配条件下基于dbn向量的鲁棒说话人验证系统","authors":"Disong Wang, Yuexian Zou, J. Liu, Y. Huang","doi":"10.1109/ICDSP.2016.7868523","DOIUrl":null,"url":null,"abstract":"Channel variability is one of the largest challenges for speaker verification (SV) techniques. Techniques in the feature, model and score domains have been applied to mitigate the channel impact. In this paper, we strive to study on robust deep feature learning with the deep belief network (DBN) by using traditional spectral features such as MFCC or PLP. In detail, during the training phase, a DBN is trained to map spectral features to the corresponding speaker identity, then deep features extracted at kth hidden layers are selected where k is determined by maximizing the ratio between within-class distance and between-class distance. In the enrollment phase, the well-trained DBN is used to extract deep features at kth hidden layers, then kth-DBN-vector is formed by averaging these features. In the test phase, kth-DBN-vector is extracted for test utterance and compared to the enrolled kth-DBN-vector to make a verification decision. To validate the effectiveness of the learned DBN-vectors for speaker verification, extensive experiments have been purposely conducted on Mandarin corpuses. It is encouraged to see that our proposed DBN-vector based SV system is superior to the state-of-the-art i-vector based SV system under channel mismatch conditions in terms of equal error rate (EER) and minimum detection cost function (minDCF).","PeriodicalId":206199,"journal":{"name":"2016 IEEE International Conference on Digital Signal Processing (DSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A robust DBN-vector based speaker verification system under channel mismatch conditions\",\"authors\":\"Disong Wang, Yuexian Zou, J. Liu, Y. Huang\",\"doi\":\"10.1109/ICDSP.2016.7868523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Channel variability is one of the largest challenges for speaker verification (SV) techniques. Techniques in the feature, model and score domains have been applied to mitigate the channel impact. In this paper, we strive to study on robust deep feature learning with the deep belief network (DBN) by using traditional spectral features such as MFCC or PLP. In detail, during the training phase, a DBN is trained to map spectral features to the corresponding speaker identity, then deep features extracted at kth hidden layers are selected where k is determined by maximizing the ratio between within-class distance and between-class distance. In the enrollment phase, the well-trained DBN is used to extract deep features at kth hidden layers, then kth-DBN-vector is formed by averaging these features. In the test phase, kth-DBN-vector is extracted for test utterance and compared to the enrolled kth-DBN-vector to make a verification decision. To validate the effectiveness of the learned DBN-vectors for speaker verification, extensive experiments have been purposely conducted on Mandarin corpuses. It is encouraged to see that our proposed DBN-vector based SV system is superior to the state-of-the-art i-vector based SV system under channel mismatch conditions in terms of equal error rate (EER) and minimum detection cost function (minDCF).\",\"PeriodicalId\":206199,\"journal\":{\"name\":\"2016 IEEE International Conference on Digital Signal Processing (DSP)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Digital Signal Processing (DSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDSP.2016.7868523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Digital Signal Processing (DSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2016.7868523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A robust DBN-vector based speaker verification system under channel mismatch conditions
Channel variability is one of the largest challenges for speaker verification (SV) techniques. Techniques in the feature, model and score domains have been applied to mitigate the channel impact. In this paper, we strive to study on robust deep feature learning with the deep belief network (DBN) by using traditional spectral features such as MFCC or PLP. In detail, during the training phase, a DBN is trained to map spectral features to the corresponding speaker identity, then deep features extracted at kth hidden layers are selected where k is determined by maximizing the ratio between within-class distance and between-class distance. In the enrollment phase, the well-trained DBN is used to extract deep features at kth hidden layers, then kth-DBN-vector is formed by averaging these features. In the test phase, kth-DBN-vector is extracted for test utterance and compared to the enrolled kth-DBN-vector to make a verification decision. To validate the effectiveness of the learned DBN-vectors for speaker verification, extensive experiments have been purposely conducted on Mandarin corpuses. It is encouraged to see that our proposed DBN-vector based SV system is superior to the state-of-the-art i-vector based SV system under channel mismatch conditions in terms of equal error rate (EER) and minimum detection cost function (minDCF).