Miao Jing , Vidhyasaharan Sethu , Beena Ahmed , Kong Aik Lee
{"title":"Quantifying prediction uncertainties in automatic speaker verification systems","authors":"Miao Jing , Vidhyasaharan Sethu , Beena Ahmed , Kong Aik Lee","doi":"10.1016/j.csl.2025.101806","DOIUrl":null,"url":null,"abstract":"<div><div>For modern automatic speaker verification (ASV) systems, explicitly quantifying the confidence for each prediction strengthens the system’s reliability by indicating in which case the system is with trust. However, current paradigms do not take this into consideration. We thus propose to express confidence in the prediction by quantifying the uncertainty in ASV predictions. This is achieved by developing a novel Bayesian framework to obtain a score distribution for each input. The mean of the distribution is used to derive the decision while the spread of the distribution represents the uncertainty arising from the plausible choices of the model parameters. To capture the plausible choices, we sample the probabilistic linear discriminant analysis (PLDA) back-end model posterior through Hamiltonian Monte-Carlo (HMC) and approximate the embedding model posterior through stochastic Langevin dynamics (SGLD) and Bayes-by-backprop. Given the resulting score distribution, a further quantification and decomposition of the prediction uncertainty are achieved by calculating the score variance, entropy, and mutual information. The quantified uncertainties include the aleatoric uncertainty and epistemic uncertainty (model uncertainty). We evaluate them by observing how they change while varying the amount of training speech, the duration, and the noise level of testing speech. The experiments indicate that the behaviour of those quantified uncertainties reflects the changes we made to the training and testing data, demonstrating the validity of the proposed method as a measure of uncertainty.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"94 ","pages":"Article 101806"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000312","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
For modern automatic speaker verification (ASV) systems, explicitly quantifying the confidence for each prediction strengthens the system’s reliability by indicating in which case the system is with trust. However, current paradigms do not take this into consideration. We thus propose to express confidence in the prediction by quantifying the uncertainty in ASV predictions. This is achieved by developing a novel Bayesian framework to obtain a score distribution for each input. The mean of the distribution is used to derive the decision while the spread of the distribution represents the uncertainty arising from the plausible choices of the model parameters. To capture the plausible choices, we sample the probabilistic linear discriminant analysis (PLDA) back-end model posterior through Hamiltonian Monte-Carlo (HMC) and approximate the embedding model posterior through stochastic Langevin dynamics (SGLD) and Bayes-by-backprop. Given the resulting score distribution, a further quantification and decomposition of the prediction uncertainty are achieved by calculating the score variance, entropy, and mutual information. The quantified uncertainties include the aleatoric uncertainty and epistemic uncertainty (model uncertainty). We evaluate them by observing how they change while varying the amount of training speech, the duration, and the noise level of testing speech. The experiments indicate that the behaviour of those quantified uncertainties reflects the changes we made to the training and testing data, demonstrating the validity of the proposed method as a measure of uncertainty.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.