Class specificity and commonality based discriminative dictionary for speaker verification

2016 Twenty Second National Conference on Communication (NCC) Pub Date : 2016-03-01 DOI:10.1109/NCC.2016.7561185

Nagendra Kumar, R. Sinha

{"title":"Class specificity and commonality based discriminative dictionary for speaker verification","authors":"Nagendra Kumar, R. Sinha","doi":"10.1109/NCC.2016.7561185","DOIUrl":null,"url":null,"abstract":"This paper explores the learning of speaker dictionary encoding class-specific and class-common information to enhance the discriminative ability in context of sparse representation based speaker verification (SV). Typically, the KSVD learned dictionary is employed that is well suited for minimizing the representation error, but is not optimized for classification purpose. The work is motivated by a similar objective approach reported in context of image classification. In this work, we explore that idea on NIST 2012 speaker recognition evaluation dataset which forms a large size multi-variability SV task. The learned discriminative dictionary is initialized with an exemplar dictionary accounting for the class-specific part while a global KSVD learned dictionary having a small number of atoms initializes the class-common part. As the atoms in common part are not intended to capture the discriminative information, the sparse coefficients corresponding to those are discarded during the classification and the decision is made using the remaining coefficients only. The explored method is contrasted with the sparse representation over exemplar dictionary and the state-of-the-art i-vector Gaussian PLDA based SV methods. The explored method is noted to provide a relative improvements of 4.5% and 13% in terms of the average decision cost factor (CDET) and the equal error rate (EER) when compared with the exemplar based SV method. On the other hand, it results in poor EER but a relative improvement of 15% in CDET in contrast to the i-vector Gaussian PLDA method. The explored approach is also noted to be more robust to additive noise.","PeriodicalId":279637,"journal":{"name":"2016 Twenty Second National Conference on Communication (NCC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Twenty Second National Conference on Communication (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2016.7561185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

This paper explores the learning of speaker dictionary encoding class-specific and class-common information to enhance the discriminative ability in context of sparse representation based speaker verification (SV). Typically, the KSVD learned dictionary is employed that is well suited for minimizing the representation error, but is not optimized for classification purpose. The work is motivated by a similar objective approach reported in context of image classification. In this work, we explore that idea on NIST 2012 speaker recognition evaluation dataset which forms a large size multi-variability SV task. The learned discriminative dictionary is initialized with an exemplar dictionary accounting for the class-specific part while a global KSVD learned dictionary having a small number of atoms initializes the class-common part. As the atoms in common part are not intended to capture the discriminative information, the sparse coefficients corresponding to those are discarded during the classification and the decision is made using the remaining coefficients only. The explored method is contrasted with the sparse representation over exemplar dictionary and the state-of-the-art i-vector Gaussian PLDA based SV methods. The explored method is noted to provide a relative improvements of 4.5% and 13% in terms of the average decision cost factor (CDET) and the equal error rate (EER) when compared with the exemplar based SV method. On the other hand, it results in poor EER but a relative improvement of 15% in CDET in contrast to the i-vector Gaussian PLDA method. The explored approach is also noted to be more robust to additive noise.

查看原文本刊更多论文

基于类特异性和共性的说话人识别词典

在基于稀疏表示的说话人验证(SV)中，研究了说话人字典编码类特定信息和类共同信息的学习，以提高识别能力。通常，使用KSVD学习字典，它非常适合最小化表示错误，但没有针对分类目的进行优化。这项工作的动机是一个类似的客观方法报道的背景下的图像分类。在这项工作中，我们在NIST 2012说话人识别评估数据集上探索了这一想法，该数据集形成了一个大尺寸的多变异性SV任务。学习到的判别字典使用包含类特定部分的范例字典进行初始化，而具有少量原子的全局KSVD学习字典初始化类公共部分。由于公共部分的原子不打算捕获判别信息，因此在分类过程中丢弃相应的稀疏系数，仅使用剩余系数进行决策。该方法与基于样本字典的稀疏表示和基于i向量高斯PLDA的SV方法进行了对比。与基于样本的SV方法相比，所探索的方法在平均决策成本因子(CDET)和等错误率(EER)方面提供了4.5%和13%的相对改进。另一方面，与i向量高斯PLDA方法相比，它导致较差的EER，但CDET相对提高了15%。该方法对加性噪声具有更强的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 Twenty Second National Conference on Communication (NCC)

自引率

0.00%

发文量