{"title":"Speaker identification through spectral entropy analysis","authors":"A. Camarena-Ibarrola, F. Luque, Edgar Chávez","doi":"10.1109/ROPEC.2017.8261607","DOIUrl":null,"url":null,"abstract":"Identifying a speaker in a noisy environment is still a challenging problem. In spite of the impressive efficacy of deep architectures, the solution obtained is an obscure mapping, a black box. For transparent classifiers, the standard feature are the Mel-Frequency Cepstral Coefficients (MFCC). In this paper we build entropy vectors out of the first sixteen critical bands according to Bark's scale as features. The classifier consists in finding the closest vector sequences of the query in the database, counting the hits as in a k-nn classifier. In one case we use the MFCC (the state of the art) and in the other case we use the described entropy vectors. For the text-independent, we used gaped vector sequences which are discussed below in the paper. For text-dependent speaker identification we obtained 80% true positives @10% false positives, while the MFCC have about 60% true positives at the same false positive rate. This is also about the same for the text-independent case. In general we obtained more area under the respective ROC curves.","PeriodicalId":260469,"journal":{"name":"2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROPEC.2017.8261607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Identifying a speaker in a noisy environment is still a challenging problem. In spite of the impressive efficacy of deep architectures, the solution obtained is an obscure mapping, a black box. For transparent classifiers, the standard feature are the Mel-Frequency Cepstral Coefficients (MFCC). In this paper we build entropy vectors out of the first sixteen critical bands according to Bark's scale as features. The classifier consists in finding the closest vector sequences of the query in the database, counting the hits as in a k-nn classifier. In one case we use the MFCC (the state of the art) and in the other case we use the described entropy vectors. For the text-independent, we used gaped vector sequences which are discussed below in the paper. For text-dependent speaker identification we obtained 80% true positives @10% false positives, while the MFCC have about 60% true positives at the same false positive rate. This is also about the same for the text-independent case. In general we obtained more area under the respective ROC curves.