{"title":"Pitch based selection of optimal search space at runtime: Speaker recognition perspective","authors":"Soma Khan, Joyanta Basu, M. S. Bepari, Rajib Roy","doi":"10.1109/IHCI.2012.6481822","DOIUrl":null,"url":null,"abstract":"Large scale speaker recognition (SR) applications demand efficient design strategy with smart optimization technique to enhance the real-time usability. Runtime selection of optimal search space can reduce the computational cost involved in this respect. This paper describes a multilayer design layout with a novel Pitch Based Dynamic Pruning (PBDP) algorithm to optimize VQ and GMM based close-set SR process. The process involves runtime selection of most likely speakers based on percentage of cumulative pitch occurrence frequencies within certain pitch ranges selected from the test utterance followed by a spectral matching using MFCC features within the reduced search space. Experiments on YOHO and NIST2008 corpus reveal that nearly 40% of the total identification time is being saved with slight (below 0.5%) increase or even decrease in average error rate. Proposed pruning method can also be applicable for selection of most likely flexible background in unconstrained cohort normalization task of verification problem.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Large scale speaker recognition (SR) applications demand efficient design strategy with smart optimization technique to enhance the real-time usability. Runtime selection of optimal search space can reduce the computational cost involved in this respect. This paper describes a multilayer design layout with a novel Pitch Based Dynamic Pruning (PBDP) algorithm to optimize VQ and GMM based close-set SR process. The process involves runtime selection of most likely speakers based on percentage of cumulative pitch occurrence frequencies within certain pitch ranges selected from the test utterance followed by a spectral matching using MFCC features within the reduced search space. Experiments on YOHO and NIST2008 corpus reveal that nearly 40% of the total identification time is being saved with slight (below 0.5%) increase or even decrease in average error rate. Proposed pruning method can also be applicable for selection of most likely flexible background in unconstrained cohort normalization task of verification problem.