Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models

Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva
{"title":"Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models","authors":"Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva","doi":"arxiv-2405.02179","DOIUrl":null,"url":null,"abstract":"Generalization is a main issue for current audio deepfake detectors, which\nstruggle to provide reliable results on out-of-distribution data. Given the\nspeed at which more and more accurate synthesis methods are developed, it is\nvery important to design techniques that work well also on data they were not\ntrained for. In this paper we study the potential of large-scale pre-trained\nmodels for audio deepfake detection, with special focus on generalization\nability. To this end, the detection problem is reformulated in a speaker\nverification framework and fake audios are exposed by the mismatch between the\nvoice sample under test and the voice of the claimed identity. With this\nparadigm, no fake speech sample is necessary in training, cutting off any link\nwith the generation method at the root, and ensuring full generalization\nability. Features are extracted by general-purpose large pre-trained models,\nwith no need for training or fine-tuning on specific fake detection or speaker\nverification datasets. At detection time only a limited set of voice fragments\nof the identity under test is required. Experiments on several datasets\nwidespread in the community show that detectors based on pre-trained models\nachieve excellent performance and show strong generalization ability, rivaling\nsupervised methods on in-distribution data and largely overcoming them on\nout-of-distribution data.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.02179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection, with special focus on generalization ability. To this end, the detection problem is reformulated in a speaker verification framework and fake audios are exposed by the mismatch between the voice sample under test and the voice of the claimed identity. With this paradigm, no fake speech sample is necessary in training, cutting off any link with the generation method at the root, and ensuring full generalization ability. Features are extracted by general-purpose large pre-trained models, with no need for training or fine-tuning on specific fake detection or speaker verification datasets. At detection time only a limited set of voice fragments of the identity under test is required. Experiments on several datasets widespread in the community show that detectors based on pre-trained models achieve excellent performance and show strong generalization ability, rivaling supervised methods on in-distribution data and largely overcoming them on out-of-distribution data.
利用大规模预训练模型实现免训练深度伪语音识别
通用化是当前音频深度伪造检测器的一个主要问题,因为这些检测器很难在非分布数据上提供可靠的结果。鉴于越来越多的精确合成方法被快速开发出来,设计出在未经训练的数据上也能良好工作的技术就显得非常重要。在本文中,我们研究了大规模预训练模型在音频深度防伪检测方面的潜力,并特别关注其通用性。为此,我们在扬声器验证框架中对检测问题进行了重新表述,通过被测声音样本与声称身份的声音之间的不匹配来揭露虚假音频。有了这种范式,在训练中就不需要假语音样本,从根本上切断了与生成方法的任何联系,确保了完全的通用性。特征由通用的大型预训练模型提取,无需在特定的假语音检测或说话人验证数据集上进行训练或微调。检测时只需要一组有限的被测身份语音片段。在社区广泛使用的几个数据集上进行的实验表明,基于预训练模型的检测器性能卓越,显示出很强的泛化能力,在分布内数据上可与有监督的方法相媲美,在分布外数据上则在很大程度上战胜了有监督的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信