Speaker identification through spectral entropy analysis

A. Camarena-Ibarrola, F. Luque, Edgar Chávez
{"title":"Speaker identification through spectral entropy analysis","authors":"A. Camarena-Ibarrola, F. Luque, Edgar Chávez","doi":"10.1109/ROPEC.2017.8261607","DOIUrl":null,"url":null,"abstract":"Identifying a speaker in a noisy environment is still a challenging problem. In spite of the impressive efficacy of deep architectures, the solution obtained is an obscure mapping, a black box. For transparent classifiers, the standard feature are the Mel-Frequency Cepstral Coefficients (MFCC). In this paper we build entropy vectors out of the first sixteen critical bands according to Bark's scale as features. The classifier consists in finding the closest vector sequences of the query in the database, counting the hits as in a k-nn classifier. In one case we use the MFCC (the state of the art) and in the other case we use the described entropy vectors. For the text-independent, we used gaped vector sequences which are discussed below in the paper. For text-dependent speaker identification we obtained 80% true positives @10% false positives, while the MFCC have about 60% true positives at the same false positive rate. This is also about the same for the text-independent case. In general we obtained more area under the respective ROC curves.","PeriodicalId":260469,"journal":{"name":"2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROPEC.2017.8261607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Identifying a speaker in a noisy environment is still a challenging problem. In spite of the impressive efficacy of deep architectures, the solution obtained is an obscure mapping, a black box. For transparent classifiers, the standard feature are the Mel-Frequency Cepstral Coefficients (MFCC). In this paper we build entropy vectors out of the first sixteen critical bands according to Bark's scale as features. The classifier consists in finding the closest vector sequences of the query in the database, counting the hits as in a k-nn classifier. In one case we use the MFCC (the state of the art) and in the other case we use the described entropy vectors. For the text-independent, we used gaped vector sequences which are discussed below in the paper. For text-dependent speaker identification we obtained 80% true positives @10% false positives, while the MFCC have about 60% true positives at the same false positive rate. This is also about the same for the text-independent case. In general we obtained more area under the respective ROC curves.
基于谱熵分析的说话人识别
在嘈杂的环境中识别说话人仍然是一个具有挑战性的问题。尽管深度体系结构具有令人印象深刻的功效,但得到的解决方案是一个模糊的映射,一个黑盒。对于透明分类器,标准特征是Mel-Frequency倒谱系数(MFCC)。本文以巴克尺度为特征,从前16个临界波段中提取熵向量。分类器包括在数据库中找到查询的最接近的向量序列,在k-nn分类器中计算命中次数。在一种情况下,我们使用MFCC(最先进的技术),在另一种情况下,我们使用所描述的熵向量。对于文本无关,我们使用了缺口矢量序列,本文将在下面讨论。对于依赖文本的说话人识别,我们获得了80%的真阳性@10%的假阳性,而MFCC在相同的假阳性率下有大约60%的真阳性。对于与文本无关的情况也是如此。总的来说,我们在各自的ROC曲线下获得了更多的面积。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信