Leveraging distance information for generalized spoofing speech detection

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jingze Lu , Yuxiang Zhang , Zhuo Li , Zengqiang Shang , Wenchao Wang , Pengyuan Zhang
{"title":"Leveraging distance information for generalized spoofing speech detection","authors":"Jingze Lu ,&nbsp;Yuxiang Zhang ,&nbsp;Zhuo Li ,&nbsp;Zengqiang Shang ,&nbsp;Wenchao Wang ,&nbsp;Pengyuan Zhang","doi":"10.1016/j.csl.2025.101804","DOIUrl":null,"url":null,"abstract":"<div><div>Spoofing speech detection (SSD) systems are confronted with insufficient generalization ability for in-the-wild data, including unseen attacks and bonafide speech from unseen distributions, which hampers their applicability in real-world scenarios. Such performance degradation could be attributed to the inherent flaw of deep neural network (DNN)-based models, that is, overlearning the training data. Inter-instance distance, which is underutilized in conventional DNN-based classifiers, proves beneficial in handling unseen samples. Our experiments indicate that the distances between bonafide speech are closer than spoofing one in certain feature spaces. Therefore, this paper proposes a distance-based method to enhance anti-spoofing models’ generalization ability. By incorporating distance features as a prefix, the proposed method achieves lightweight parameter updates while effectively detecting unseen attacks and bonafide utterances from unseen distributions. On the logical access of ASVspoof 2019 and ASVspoof 2021, the proposed method achieves 0.53% and 4.73% equal error rates (EERs). Moreover, it achieves 1.86% and 7.30% EERs on the ASVspoof 2021 Deepfake and IntheWild datasets, respectively, demonstrating its superior generalization ability. The proposed method outperforms other state-of-the-art (SOTA) methods on multiple datasets.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"94 ","pages":"Article 101804"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000294","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Spoofing speech detection (SSD) systems are confronted with insufficient generalization ability for in-the-wild data, including unseen attacks and bonafide speech from unseen distributions, which hampers their applicability in real-world scenarios. Such performance degradation could be attributed to the inherent flaw of deep neural network (DNN)-based models, that is, overlearning the training data. Inter-instance distance, which is underutilized in conventional DNN-based classifiers, proves beneficial in handling unseen samples. Our experiments indicate that the distances between bonafide speech are closer than spoofing one in certain feature spaces. Therefore, this paper proposes a distance-based method to enhance anti-spoofing models’ generalization ability. By incorporating distance features as a prefix, the proposed method achieves lightweight parameter updates while effectively detecting unseen attacks and bonafide utterances from unseen distributions. On the logical access of ASVspoof 2019 and ASVspoof 2021, the proposed method achieves 0.53% and 4.73% equal error rates (EERs). Moreover, it achieves 1.86% and 7.30% EERs on the ASVspoof 2021 Deepfake and IntheWild datasets, respectively, demonstrating its superior generalization ability. The proposed method outperforms other state-of-the-art (SOTA) methods on multiple datasets.
利用距离信息进行广义欺骗语音检测
欺骗语音检测(Spoofing speech detection, SSD)系统对不可见的攻击和不可见分布的真实语音等野外数据泛化能力不足,影响了其在实际应用中的适用性。这种性能下降可归因于基于深度神经网络(DNN)的模型的固有缺陷,即对训练数据的过度学习。在传统的基于dnn的分类器中未充分利用的实例间距离在处理看不见的样本时被证明是有益的。我们的实验表明,在某些特征空间中,真实语音之间的距离比欺骗语音更近。因此,本文提出了一种基于距离的方法来增强抗欺骗模型的泛化能力。通过将距离特征作为前缀,该方法实现了轻量级的参数更新,同时有效地检测了未见分布中的未见攻击和真实话语。在ASVspoof 2019和ASVspoof 2021的逻辑访问上,该方法分别实现了0.53%和4.73%的等误差率(EERs)。此外,它在ASVspoof 2021 Deepfake和IntheWild数据集上分别达到了1.86%和7.30%的EERs,显示了优越的泛化能力。该方法在多个数据集上优于其他最先进的(SOTA)方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信