Pitch based selection of optimal search space at runtime: Speaker recognition perspective

Soma Khan, Joyanta Basu, M. S. Bepari, Rajib Roy
{"title":"Pitch based selection of optimal search space at runtime: Speaker recognition perspective","authors":"Soma Khan, Joyanta Basu, M. S. Bepari, Rajib Roy","doi":"10.1109/IHCI.2012.6481822","DOIUrl":null,"url":null,"abstract":"Large scale speaker recognition (SR) applications demand efficient design strategy with smart optimization technique to enhance the real-time usability. Runtime selection of optimal search space can reduce the computational cost involved in this respect. This paper describes a multilayer design layout with a novel Pitch Based Dynamic Pruning (PBDP) algorithm to optimize VQ and GMM based close-set SR process. The process involves runtime selection of most likely speakers based on percentage of cumulative pitch occurrence frequencies within certain pitch ranges selected from the test utterance followed by a spectral matching using MFCC features within the reduced search space. Experiments on YOHO and NIST2008 corpus reveal that nearly 40% of the total identification time is being saved with slight (below 0.5%) increase or even decrease in average error rate. Proposed pruning method can also be applicable for selection of most likely flexible background in unconstrained cohort normalization task of verification problem.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Large scale speaker recognition (SR) applications demand efficient design strategy with smart optimization technique to enhance the real-time usability. Runtime selection of optimal search space can reduce the computational cost involved in this respect. This paper describes a multilayer design layout with a novel Pitch Based Dynamic Pruning (PBDP) algorithm to optimize VQ and GMM based close-set SR process. The process involves runtime selection of most likely speakers based on percentage of cumulative pitch occurrence frequencies within certain pitch ranges selected from the test utterance followed by a spectral matching using MFCC features within the reduced search space. Experiments on YOHO and NIST2008 corpus reveal that nearly 40% of the total identification time is being saved with slight (below 0.5%) increase or even decrease in average error rate. Proposed pruning method can also be applicable for selection of most likely flexible background in unconstrained cohort normalization task of verification problem.
基于音高的运行时最佳搜索空间选择:说话人识别视角
大规模的说话人识别应用需要高效的设计策略和智能优化技术来提高实时可用性。在运行时选择最优搜索空间可以减少这方面的计算成本。本文提出了一种基于螺距动态剪枝(PBDP)算法的多层设计布局,以优化基于VQ和GMM的近集SR过程。该过程包括基于从测试话语中选择的特定音高范围内累积音高出现频率的百分比来选择最可能的说话者,然后在缩减的搜索空间内使用MFCC特征进行频谱匹配。在YOHO和NIST2008语料库上的实验表明,在平均错误率略有(低于0.5%)增加甚至减少的情况下,节省了近40%的总识别时间。所提出的剪枝方法同样适用于验证问题的无约束队列归一化任务中最可能灵活背景的选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信