An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization

Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen
{"title":"An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization","authors":"Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen","doi":"arxiv-2409.11027","DOIUrl":null,"url":null,"abstract":"We propose a novel approach for spoofed speech characterization through\nexplainable probabilistic attribute embeddings. In contrast to high-dimensional\nraw embeddings extracted from a spoofing countermeasure (CM) whose dimensions\nare not easy to interpret, the probabilistic attributes are designed to gauge\nthe presence or absence of sub-components that make up a specific spoofing\nattack. These attributes are then applied to two downstream tasks: spoofing\ndetection and attack attribution. To enforce interpretability also to the\nback-end, we adopt a decision tree classifier. Our experiments on the\nASVspoof2019 dataset with spoof CM embeddings extracted from three models\n(AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the\nattribute embeddings are on par with the original raw spoof CM embeddings for\nboth tasks. The best performance achieved with the proposed approach for\nspoofing detection and attack attribution, in terms of accuracy, is 99.7% and\n99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings.\nTo analyze the relative contribution of each attribute, we estimate their\nShapley values. Attributes related to acoustic feature prediction, waveform\ngeneration (vocoder), and speaker modeling are found important for spoofing\ndetection; while duration modeling, vocoder, and input type play a role in\nspoofing attack attribution.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We propose a novel approach for spoofed speech characterization through explainable probabilistic attribute embeddings. In contrast to high-dimensional raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions are not easy to interpret, the probabilistic attributes are designed to gauge the presence or absence of sub-components that make up a specific spoofing attack. These attributes are then applied to two downstream tasks: spoofing detection and attack attribution. To enforce interpretability also to the back-end, we adopt a decision tree classifier. Our experiments on the ASVspoof2019 dataset with spoof CM embeddings extracted from three models (AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the attribute embeddings are on par with the original raw spoof CM embeddings for both tasks. The best performance achieved with the proposed approach for spoofing detection and attack attribution, in terms of accuracy, is 99.7% and 99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings. To analyze the relative contribution of each attribute, we estimate their Shapley values. Attributes related to acoustic feature prediction, waveform generation (vocoder), and speaker modeling are found important for spoofing detection; while duration modeling, vocoder, and input type play a role in spoofing attack attribution.
用于欺骗性语音特征描述的可解释概率属性嵌入方法
我们提出了一种通过可解释的概率属性嵌入来描述欺骗语音特征的新方法。从欺骗对策(CM)中提取的高维草图嵌入不容易解释,与之相反,概率属性旨在衡量是否存在构成特定欺骗攻击的子组件。然后将这些属性应用于两个下游任务:欺骗检测和攻击归因。为了使后端也具有可解释性,我们采用了决策树分类器。我们使用从三种模型(AASIST、Rawboost-AASIST、SSL-AASIST)中提取的欺骗性 CM 嵌入在 ASVspoof2019 数据集上进行的实验表明,属性嵌入在这两项任务中的性能与原始的欺骗性 CM 嵌入相当。在欺骗检测和攻击归因方面,拟议方法的准确率分别达到 99.7% 和 99.2%,而使用原始 CM 嵌入的准确率分别为 99.7% 和 94.7%。我们发现,与声学特征预测、波形生成(声码器)和扬声器建模相关的属性对于欺骗检测非常重要;而时长建模、声码器和输入类型则在欺骗攻击归因中发挥了作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信