ASVspoof 5：设计、收集和验证使用众包语音进行欺骗、深度伪造和对抗性攻击检测的资源

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Speech and Language Pub Date : 2025-05-28 DOI:10.1016/j.csl.2025.101825

Xin Wang , Héctor Delgado , Hemlata Tak , Jee-weon Jung , Hye-jin Shim , Massimiliano Todisco , Ivan Kukanov , Xuechen Liu , Md Sahidullah , Tomi Kinnunen , Nicholas Evans , Kong Aik Lee , Junichi Yamagishi , Myeonghun Jeong , Ge Zhu , Yongyi Zang , You Zhang , Soumi Maiti , Florian Lux , Nicolas Müller , Vishwanath Singh

{"title":"ASVspoof 5：设计、收集和验证使用众包语音进行欺骗、深度伪造和对抗性攻击检测的资源","authors":"Xin Wang , Héctor Delgado , Hemlata Tak , Jee-weon Jung , Hye-jin Shim , Massimiliano Todisco , Ivan Kukanov , Xuechen Liu , Md Sahidullah , Tomi Kinnunen , Nicholas Evans , Kong Aik Lee , Junichi Yamagishi , Myeonghun Jeong , Ge Zhu , Yongyi Zang , You Zhang , Soumi Maiti , Florian Lux , Nicolas Müller , Vishwanath Singh","doi":"10.1016/j.csl.2025.101825","DOIUrl":null,"url":null,"abstract":"<div><div>ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks as well as the design of detection solutions. We introduce the ASVspoof 5 database which is generated in a crowdsourced fashion from data collected in diverse acoustic conditions (cf. studio-quality data for earlier ASVspoof databases) and from <span><math><mo>∼</mo></math></span>2000 speakers (cf. <span><math><mo>∼</mo></math></span>100 earlier). The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models. Among them are attacks generated with a mix of legacy and contemporary text-to-speech synthesis and voice conversion models, in addition to adversarial attacks which are incorporated for the first time. ASVspoof 5 protocols comprise seven speaker-disjoint partitions. They include two distinct partitions for the training of different sets of attack models, two more for the development and evaluation of surrogate detection models, and then three additional partitions which comprise the ASVspoof 5 training, development and evaluation sets. An auxiliary set of data collected from an additional 30k speakers can also be used to train speaker encoders for the implementation of attack algorithms. Also described herein is an experimental validation of the new ASVspoof 5 database using a set of automatic speaker verification and spoof/deepfake baseline detectors. With the exception of protocols and tools for the generation of spoofed/deepfake speech, the resources described in this paper, already used by participants of the ASVspoof 5 challenge in 2024, are now all freely available to the community.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"95 ","pages":"Article 101825"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ASVspoof 5: Design, collection and validation of resources for spoofing, deepfake, and adversarial attack detection using crowdsourced speech\",\"authors\":\"Xin Wang , Héctor Delgado , Hemlata Tak , Jee-weon Jung , Hye-jin Shim , Massimiliano Todisco , Ivan Kukanov , Xuechen Liu , Md Sahidullah , Tomi Kinnunen , Nicholas Evans , Kong Aik Lee , Junichi Yamagishi , Myeonghun Jeong , Ge Zhu , Yongyi Zang , You Zhang , Soumi Maiti , Florian Lux , Nicolas Müller , Vishwanath Singh\",\"doi\":\"10.1016/j.csl.2025.101825\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks as well as the design of detection solutions. We introduce the ASVspoof 5 database which is generated in a crowdsourced fashion from data collected in diverse acoustic conditions (cf. studio-quality data for earlier ASVspoof databases) and from <span><math><mo>∼</mo></math></span>2000 speakers (cf. <span><math><mo>∼</mo></math></span>100 earlier). The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models. Among them are attacks generated with a mix of legacy and contemporary text-to-speech synthesis and voice conversion models, in addition to adversarial attacks which are incorporated for the first time. ASVspoof 5 protocols comprise seven speaker-disjoint partitions. They include two distinct partitions for the training of different sets of attack models, two more for the development and evaluation of surrogate detection models, and then three additional partitions which comprise the ASVspoof 5 training, development and evaluation sets. An auxiliary set of data collected from an additional 30k speakers can also be used to train speaker encoders for the implementation of attack algorithms. Also described herein is an experimental validation of the new ASVspoof 5 database using a set of automatic speaker verification and spoof/deepfake baseline detectors. With the exception of protocols and tools for the generation of spoofed/deepfake speech, the resources described in this paper, already used by participants of the ASVspoof 5 challenge in 2024, are now all freely available to the community.</div></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":\"95 \",\"pages\":\"Article 101825\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230825000506\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230825000506","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

ASVspoof 5是促进语音欺骗和深度伪造攻击研究以及检测解决方案设计的一系列挑战中的第五版。我们介绍了ASVspoof 5数据库，该数据库以众包的方式从不同声学条件下收集的数据（参见早期ASVspoof数据库的工作室质量数据）和来自~ 2000个扬声器（参见之前的~ 100个）生成。该数据库包含由32种不同算法生成的攻击，这些算法也是众包的，并使用新的代理检测模型进行了不同程度的优化。其中包括由传统和现代文本到语音合成和语音转换模型混合产生的攻击，以及首次纳入的对抗性攻击。asvspoof5协议包括七个扬声器不连接的分区。它们包括两个不同的分区，用于训练不同的攻击模型集，另外两个用于开发和评估代理检测模型，然后还有三个额外的分区，包括ASVspoof 5训练、开发和评估集。从另外30k个扬声器收集的辅助数据集也可用于训练扬声器编码器以实现攻击算法。本文还描述了使用一组自动说话人验证和欺骗/深度伪造基线检测器对新的asvspoof5数据库的实验验证。除了用于生成欺骗/深度虚假语音的协议和工具外，本文中描述的资源已经被2024年ASVspoof 5挑战的参与者使用，现在所有资源都免费提供给社区。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ASVspoof 5: Design, collection and validation of resources for spoofing, deepfake, and adversarial attack detection using crowdsourced speech

ASVspoof 5 is the fifth edition in a series of challenges which promote the study of speech spoofing and deepfake attacks as well as the design of detection solutions. We introduce the ASVspoof 5 database which is generated in a crowdsourced fashion from data collected in diverse acoustic conditions (cf. studio-quality data for earlier ASVspoof databases) and from

\sim

2000 speakers (cf.

\sim

100 earlier). The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models. Among them are attacks generated with a mix of legacy and contemporary text-to-speech synthesis and voice conversion models, in addition to adversarial attacks which are incorporated for the first time. ASVspoof 5 protocols comprise seven speaker-disjoint partitions. They include two distinct partitions for the training of different sets of attack models, two more for the development and evaluation of surrogate detection models, and then three additional partitions which comprise the ASVspoof 5 training, development and evaluation sets. An auxiliary set of data collected from an additional 30k speakers can also be used to train speaker encoders for the implementation of attack algorithms. Also described herein is an experimental validation of the new ASVspoof 5 database using a set of automatic speaker verification and spoof/deepfake baseline detectors. With the exception of protocols and tools for the generation of spoofed/deepfake speech, the resources described in this paper, already used by participants of the ASVspoof 5 challenge in 2024, are now all freely available to the community.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Speech and Language 工程技术-计算机：人工智能

CiteScore

11.30

自引率

4.70%

发文量

审稿时长

22.9 weeks

期刊介绍： Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.