Speaker identification through spectral entropy analysis

2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) Pub Date : 2017-11-01 DOI:10.1109/ROPEC.2017.8261607

A. Camarena-Ibarrola, F. Luque, Edgar Chávez

{"title":"Speaker identification through spectral entropy analysis","authors":"A. Camarena-Ibarrola, F. Luque, Edgar Chávez","doi":"10.1109/ROPEC.2017.8261607","DOIUrl":null,"url":null,"abstract":"Identifying a speaker in a noisy environment is still a challenging problem. In spite of the impressive efficacy of deep architectures, the solution obtained is an obscure mapping, a black box. For transparent classifiers, the standard feature are the Mel-Frequency Cepstral Coefficients (MFCC). In this paper we build entropy vectors out of the first sixteen critical bands according to Bark's scale as features. The classifier consists in finding the closest vector sequences of the query in the database, counting the hits as in a k-nn classifier. In one case we use the MFCC (the state of the art) and in the other case we use the described entropy vectors. For the text-independent, we used gaped vector sequences which are discussed below in the paper. For text-dependent speaker identification we obtained 80% true positives @10% false positives, while the MFCC have about 60% true positives at the same false positive rate. This is also about the same for the text-independent case. In general we obtained more area under the respective ROC curves.","PeriodicalId":260469,"journal":{"name":"2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROPEC.2017.8261607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Identifying a speaker in a noisy environment is still a challenging problem. In spite of the impressive efficacy of deep architectures, the solution obtained is an obscure mapping, a black box. For transparent classifiers, the standard feature are the Mel-Frequency Cepstral Coefficients (MFCC). In this paper we build entropy vectors out of the first sixteen critical bands according to Bark's scale as features. The classifier consists in finding the closest vector sequences of the query in the database, counting the hits as in a k-nn classifier. In one case we use the MFCC (the state of the art) and in the other case we use the described entropy vectors. For the text-independent, we used gaped vector sequences which are discussed below in the paper. For text-dependent speaker identification we obtained 80% true positives @10% false positives, while the MFCC have about 60% true positives at the same false positive rate. This is also about the same for the text-independent case. In general we obtained more area under the respective ROC curves.

查看原文本刊更多论文

基于谱熵分析的说话人识别

在嘈杂的环境中识别说话人仍然是一个具有挑战性的问题。尽管深度体系结构具有令人印象深刻的功效，但得到的解决方案是一个模糊的映射，一个黑盒。对于透明分类器，标准特征是Mel-Frequency倒谱系数(MFCC)。本文以巴克尺度为特征，从前16个临界波段中提取熵向量。分类器包括在数据库中找到查询的最接近的向量序列，在k-nn分类器中计算命中次数。在一种情况下，我们使用MFCC(最先进的技术)，在另一种情况下，我们使用所描述的熵向量。对于文本无关，我们使用了缺口矢量序列，本文将在下面讨论。对于依赖文本的说话人识别，我们获得了80%的真阳性@10%的假阳性，而MFCC在相同的假阳性率下有大约60%的真阳性。对于与文本无关的情况也是如此。总的来说，我们在各自的ROC曲线下获得了更多的面积。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)

自引率

0.00%

发文量