{"title":"Class specificity and commonality based discriminative dictionary for speaker verification","authors":"Nagendra Kumar, R. Sinha","doi":"10.1109/NCC.2016.7561185","DOIUrl":null,"url":null,"abstract":"This paper explores the learning of speaker dictionary encoding class-specific and class-common information to enhance the discriminative ability in context of sparse representation based speaker verification (SV). Typically, the KSVD learned dictionary is employed that is well suited for minimizing the representation error, but is not optimized for classification purpose. The work is motivated by a similar objective approach reported in context of image classification. In this work, we explore that idea on NIST 2012 speaker recognition evaluation dataset which forms a large size multi-variability SV task. The learned discriminative dictionary is initialized with an exemplar dictionary accounting for the class-specific part while a global KSVD learned dictionary having a small number of atoms initializes the class-common part. As the atoms in common part are not intended to capture the discriminative information, the sparse coefficients corresponding to those are discarded during the classification and the decision is made using the remaining coefficients only. The explored method is contrasted with the sparse representation over exemplar dictionary and the state-of-the-art i-vector Gaussian PLDA based SV methods. The explored method is noted to provide a relative improvements of 4.5% and 13% in terms of the average decision cost factor (CDET) and the equal error rate (EER) when compared with the exemplar based SV method. On the other hand, it results in poor EER but a relative improvement of 15% in CDET in contrast to the i-vector Gaussian PLDA method. The explored approach is also noted to be more robust to additive noise.","PeriodicalId":279637,"journal":{"name":"2016 Twenty Second National Conference on Communication (NCC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Twenty Second National Conference on Communication (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2016.7561185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This paper explores the learning of speaker dictionary encoding class-specific and class-common information to enhance the discriminative ability in context of sparse representation based speaker verification (SV). Typically, the KSVD learned dictionary is employed that is well suited for minimizing the representation error, but is not optimized for classification purpose. The work is motivated by a similar objective approach reported in context of image classification. In this work, we explore that idea on NIST 2012 speaker recognition evaluation dataset which forms a large size multi-variability SV task. The learned discriminative dictionary is initialized with an exemplar dictionary accounting for the class-specific part while a global KSVD learned dictionary having a small number of atoms initializes the class-common part. As the atoms in common part are not intended to capture the discriminative information, the sparse coefficients corresponding to those are discarded during the classification and the decision is made using the remaining coefficients only. The explored method is contrasted with the sparse representation over exemplar dictionary and the state-of-the-art i-vector Gaussian PLDA based SV methods. The explored method is noted to provide a relative improvements of 4.5% and 13% in terms of the average decision cost factor (CDET) and the equal error rate (EER) when compared with the exemplar based SV method. On the other hand, it results in poor EER but a relative improvement of 15% in CDET in contrast to the i-vector Gaussian PLDA method. The explored approach is also noted to be more robust to additive noise.