R. Saeidi, T. Kinnunen, Hamid Reza Mohammadi, R. Rodman, P. Fränti
{"title":"Joint frame and Gaussian selection for text independent speaker verification","authors":"R. Saeidi, T. Kinnunen, Hamid Reza Mohammadi, R. Rodman, P. Fränti","doi":"10.1109/ICASSP.2010.5495576","DOIUrl":null,"url":null,"abstract":"Gaussian selection is a technique applied in the GMM-UBM framework to accelerate score calculation. We have recently introduced a novel Gaussian selection method known as sorted GMM (SGMM). SGMM uses scalar-indexing of the universal background model mean vectors to achieve fast search of the top-scoring Gaussians. In the present work we extend this method by using 2-dimensional indexing, which leads to simultaneous frame and Gaussian selection. Our results on the NIST 2002 speaker recognition evaluation corpus indicate that both the 1- and 2- dimensional SGMMs outperform frame decimation and temporal tracking of top-scoring Gaussians by a wide margin (in terms of Gaussian computations relative to GMM-UBM as baseline).","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2010.5495576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Gaussian selection is a technique applied in the GMM-UBM framework to accelerate score calculation. We have recently introduced a novel Gaussian selection method known as sorted GMM (SGMM). SGMM uses scalar-indexing of the universal background model mean vectors to achieve fast search of the top-scoring Gaussians. In the present work we extend this method by using 2-dimensional indexing, which leads to simultaneous frame and Gaussian selection. Our results on the NIST 2002 speaker recognition evaluation corpus indicate that both the 1- and 2- dimensional SGMMs outperform frame decimation and temporal tracking of top-scoring Gaussians by a wide margin (in terms of Gaussian computations relative to GMM-UBM as baseline).