Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen
{"title":"用于欺骗性语音特征描述的可解释概率属性嵌入方法","authors":"Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen","doi":"arxiv-2409.11027","DOIUrl":null,"url":null,"abstract":"We propose a novel approach for spoofed speech characterization through\nexplainable probabilistic attribute embeddings. In contrast to high-dimensional\nraw embeddings extracted from a spoofing countermeasure (CM) whose dimensions\nare not easy to interpret, the probabilistic attributes are designed to gauge\nthe presence or absence of sub-components that make up a specific spoofing\nattack. These attributes are then applied to two downstream tasks: spoofing\ndetection and attack attribution. To enforce interpretability also to the\nback-end, we adopt a decision tree classifier. Our experiments on the\nASVspoof2019 dataset with spoof CM embeddings extracted from three models\n(AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the\nattribute embeddings are on par with the original raw spoof CM embeddings for\nboth tasks. The best performance achieved with the proposed approach for\nspoofing detection and attack attribution, in terms of accuracy, is 99.7% and\n99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings.\nTo analyze the relative contribution of each attribute, we estimate their\nShapley values. Attributes related to acoustic feature prediction, waveform\ngeneration (vocoder), and speaker modeling are found important for spoofing\ndetection; while duration modeling, vocoder, and input type play a role in\nspoofing attack attribution.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization\",\"authors\":\"Manasi Chhibber, Jagabandhu Mishra, Hyejin Shim, Tomi H. Kinnunen\",\"doi\":\"arxiv-2409.11027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel approach for spoofed speech characterization through\\nexplainable probabilistic attribute embeddings. In contrast to high-dimensional\\nraw embeddings extracted from a spoofing countermeasure (CM) whose dimensions\\nare not easy to interpret, the probabilistic attributes are designed to gauge\\nthe presence or absence of sub-components that make up a specific spoofing\\nattack. These attributes are then applied to two downstream tasks: spoofing\\ndetection and attack attribution. To enforce interpretability also to the\\nback-end, we adopt a decision tree classifier. Our experiments on the\\nASVspoof2019 dataset with spoof CM embeddings extracted from three models\\n(AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the\\nattribute embeddings are on par with the original raw spoof CM embeddings for\\nboth tasks. The best performance achieved with the proposed approach for\\nspoofing detection and attack attribution, in terms of accuracy, is 99.7% and\\n99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings.\\nTo analyze the relative contribution of each attribute, we estimate their\\nShapley values. Attributes related to acoustic feature prediction, waveform\\ngeneration (vocoder), and speaker modeling are found important for spoofing\\ndetection; while duration modeling, vocoder, and input type play a role in\\nspoofing attack attribution.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
我们提出了一种通过可解释的概率属性嵌入来描述欺骗语音特征的新方法。从欺骗对策(CM)中提取的高维草图嵌入不容易解释,与之相反,概率属性旨在衡量是否存在构成特定欺骗攻击的子组件。然后将这些属性应用于两个下游任务:欺骗检测和攻击归因。为了使后端也具有可解释性,我们采用了决策树分类器。我们使用从三种模型(AASIST、Rawboost-AASIST、SSL-AASIST)中提取的欺骗性 CM 嵌入在 ASVspoof2019 数据集上进行的实验表明,属性嵌入在这两项任务中的性能与原始的欺骗性 CM 嵌入相当。在欺骗检测和攻击归因方面,拟议方法的准确率分别达到 99.7% 和 99.2%,而使用原始 CM 嵌入的准确率分别为 99.7% 和 94.7%。我们发现,与声学特征预测、波形生成(声码器)和扬声器建模相关的属性对于欺骗检测非常重要;而时长建模、声码器和输入类型则在欺骗攻击归因中发挥了作用。
An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization
We propose a novel approach for spoofed speech characterization through
explainable probabilistic attribute embeddings. In contrast to high-dimensional
raw embeddings extracted from a spoofing countermeasure (CM) whose dimensions
are not easy to interpret, the probabilistic attributes are designed to gauge
the presence or absence of sub-components that make up a specific spoofing
attack. These attributes are then applied to two downstream tasks: spoofing
detection and attack attribution. To enforce interpretability also to the
back-end, we adopt a decision tree classifier. Our experiments on the
ASVspoof2019 dataset with spoof CM embeddings extracted from three models
(AASIST, Rawboost-AASIST, SSL-AASIST) suggest that the performance of the
attribute embeddings are on par with the original raw spoof CM embeddings for
both tasks. The best performance achieved with the proposed approach for
spoofing detection and attack attribution, in terms of accuracy, is 99.7% and
99.2%, respectively, compared to 99.7% and 94.7% using the raw CM embeddings.
To analyze the relative contribution of each attribute, we estimate their
Shapley values. Attributes related to acoustic feature prediction, waveform
generation (vocoder), and speaker modeling are found important for spoofing
detection; while duration modeling, vocoder, and input type play a role in
spoofing attack attribution.