Quantifying prediction uncertainties in automatic speaker verification systems

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Miao Jing , Vidhyasaharan Sethu , Beena Ahmed , Kong Aik Lee
{"title":"Quantifying prediction uncertainties in automatic speaker verification systems","authors":"Miao Jing ,&nbsp;Vidhyasaharan Sethu ,&nbsp;Beena Ahmed ,&nbsp;Kong Aik Lee","doi":"10.1016/j.csl.2025.101806","DOIUrl":null,"url":null,"abstract":"<div><div>For modern automatic speaker verification (ASV) systems, explicitly quantifying the confidence for each prediction strengthens the system’s reliability by indicating in which case the system is with trust. However, current paradigms do not take this into consideration. We thus propose to express confidence in the prediction by quantifying the uncertainty in ASV predictions. This is achieved by developing a novel Bayesian framework to obtain a score distribution for each input. The mean of the distribution is used to derive the decision while the spread of the distribution represents the uncertainty arising from the plausible choices of the model parameters. To capture the plausible choices, we sample the probabilistic linear discriminant analysis (PLDA) back-end model posterior through Hamiltonian Monte-Carlo (HMC) and approximate the embedding model posterior through stochastic Langevin dynamics (SGLD) and Bayes-by-backprop. Given the resulting score distribution, a further quantification and decomposition of the prediction uncertainty are achieved by calculating the score variance, entropy, and mutual information. The quantified uncertainties include the aleatoric uncertainty and epistemic uncertainty (model uncertainty). We evaluate them by observing how they change while varying the amount of training speech, the duration, and the noise level of testing speech. The experiments indicate that the behaviour of those quantified uncertainties reflects the changes we made to the training and testing data, demonstrating the validity of the proposed method as a measure of uncertainty.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"94 ","pages":"Article 101806"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000312","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

For modern automatic speaker verification (ASV) systems, explicitly quantifying the confidence for each prediction strengthens the system’s reliability by indicating in which case the system is with trust. However, current paradigms do not take this into consideration. We thus propose to express confidence in the prediction by quantifying the uncertainty in ASV predictions. This is achieved by developing a novel Bayesian framework to obtain a score distribution for each input. The mean of the distribution is used to derive the decision while the spread of the distribution represents the uncertainty arising from the plausible choices of the model parameters. To capture the plausible choices, we sample the probabilistic linear discriminant analysis (PLDA) back-end model posterior through Hamiltonian Monte-Carlo (HMC) and approximate the embedding model posterior through stochastic Langevin dynamics (SGLD) and Bayes-by-backprop. Given the resulting score distribution, a further quantification and decomposition of the prediction uncertainty are achieved by calculating the score variance, entropy, and mutual information. The quantified uncertainties include the aleatoric uncertainty and epistemic uncertainty (model uncertainty). We evaluate them by observing how they change while varying the amount of training speech, the duration, and the noise level of testing speech. The experiments indicate that the behaviour of those quantified uncertainties reflects the changes we made to the training and testing data, demonstrating the validity of the proposed method as a measure of uncertainty.
自动说话人验证系统中预测不确定性的量化
对于现代自动说话人验证(ASV)系统,通过明确量化每个预测的置信度来表明系统在哪种情况下是可信的,从而增强了系统的可靠性。然而,目前的范例并没有考虑到这一点。因此,我们建议通过量化ASV预测中的不确定性来表达对预测的信心。这是通过开发一种新的贝叶斯框架来获得每个输入的分数分布来实现的。分布的均值用于导出决策,而分布的扩散表示由模型参数的合理选择引起的不确定性。为了获得合理的选择,我们通过哈密顿蒙特卡罗(HMC)对概率线性判别分析(PLDA)后端模型进行后验采样,并通过随机朗格万动力学(SGLD)和Bayes-by-backprop近似嵌入模型的后验。根据得到的分数分布,通过计算分数方差、熵和互信息,进一步量化和分解预测不确定性。量化的不确定性包括任意不确定性和认知不确定性(模型不确定性)。我们通过观察它们在不同的训练语量、持续时间和测试语的噪音水平时的变化来评估它们。实验表明,这些量化不确定度的行为反映了我们对训练和测试数据所做的更改,证明了所提出的方法作为不确定度度量的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信