对深度假语音检测的全面调查和批判性分析

IF 13.3 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science Review Pub Date : 2025-04-16 DOI:10.1016/j.cosrev.2025.100757

Lam Pham , Phat Lam , Dat Tran , Hieu Tang , Tin Nguyen , Alexander Schindler , Florian Skopik , Alexander Polonsky , Hai Canh Vu

{"title":"对深度假语音检测的全面调查和批判性分析","authors":"Lam Pham , Phat Lam , Dat Tran , Hieu Tang , Tin Nguyen , Alexander Schindler , Florian Skopik , Alexander Polonsky , Hai Canh Vu","doi":"10.1016/j.cosrev.2025.100757","DOIUrl":null,"url":null,"abstract":"<div><div>Thanks to advancements in deep learning, speech generation systems now power a variety of real-world applications, such as text-to-speech for individuals with speech disorders, voice chatbots in call centers, cross-linguistic speech translation, etc. While these systems can autonomously generate human-like speech and replicate specific voices, they also pose risks when misused for malicious purposes. This motivates the research community to develop models for detecting synthesized speech (e.g., fake speech) generated by deep-learning-based models, referred to as the Deepfake Speech Detection task. As the Deepfake Speech Detection task has emerged in recent years, there are not many survey papers proposed for this task. Additionally, existing surveys for the Deepfake Speech Detection task tend to summarize techniques used to construct a Deepfake Speech Detection system rather than providing a thorough analysis. This gap motivated us to conduct a comprehensive survey, providing a critical analysis of the challenges and developments in Deepfake Speech Detection (This work is a part of our projects of STARLIGHT, EUCINF, and DEFAME FAKEs). Our survey is innovatively structured, offering an in-depth analysis of current challenge competitions, public datasets, and the deep-learning techniques that provide enhanced solutions to address existing challenges in the field. From our analysis, we propose hypotheses on leveraging and combining specific deep learning techniques to improve the effectiveness of Deepfake Speech Detection systems. Beyond conducting a survey, we perform extensive experiments to validate these hypotheses and propose a highly competitive model for the task of Deepfake Speech Detection. Given the analysis and the experimental results, we finally indicate potential and promising research directions for the Deepfake Speech Detection task.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"57 ","pages":"Article 100757"},"PeriodicalIF":13.3000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comprehensive survey with critical analysis for deepfake speech detection\",\"authors\":\"Lam Pham , Phat Lam , Dat Tran , Hieu Tang , Tin Nguyen , Alexander Schindler , Florian Skopik , Alexander Polonsky , Hai Canh Vu\",\"doi\":\"10.1016/j.cosrev.2025.100757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Thanks to advancements in deep learning, speech generation systems now power a variety of real-world applications, such as text-to-speech for individuals with speech disorders, voice chatbots in call centers, cross-linguistic speech translation, etc. While these systems can autonomously generate human-like speech and replicate specific voices, they also pose risks when misused for malicious purposes. This motivates the research community to develop models for detecting synthesized speech (e.g., fake speech) generated by deep-learning-based models, referred to as the Deepfake Speech Detection task. As the Deepfake Speech Detection task has emerged in recent years, there are not many survey papers proposed for this task. Additionally, existing surveys for the Deepfake Speech Detection task tend to summarize techniques used to construct a Deepfake Speech Detection system rather than providing a thorough analysis. This gap motivated us to conduct a comprehensive survey, providing a critical analysis of the challenges and developments in Deepfake Speech Detection (This work is a part of our projects of STARLIGHT, EUCINF, and DEFAME FAKEs). Our survey is innovatively structured, offering an in-depth analysis of current challenge competitions, public datasets, and the deep-learning techniques that provide enhanced solutions to address existing challenges in the field. From our analysis, we propose hypotheses on leveraging and combining specific deep learning techniques to improve the effectiveness of Deepfake Speech Detection systems. Beyond conducting a survey, we perform extensive experiments to validate these hypotheses and propose a highly competitive model for the task of Deepfake Speech Detection. Given the analysis and the experimental results, we finally indicate potential and promising research directions for the Deepfake Speech Detection task.</div></div>\",\"PeriodicalId\":48633,\"journal\":{\"name\":\"Computer Science Review\",\"volume\":\"57 \",\"pages\":\"Article 100757\"},\"PeriodicalIF\":13.3000,\"publicationDate\":\"2025-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574013725000334\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725000334","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

由于深度学习的进步，语音生成系统现在为各种现实世界的应用提供动力，例如语音障碍患者的文本到语音，呼叫中心的语音聊天机器人，跨语言语音翻译等。虽然这些系统可以自动生成类似人类的语音并复制特定的声音，但当它们被滥用于恶意目的时，也会带来风险。这促使研究界开发用于检测由基于深度学习的模型生成的合成语音（例如假语音）的模型，称为Deepfake语音检测任务。由于近年来出现了Deepfake语音检测任务，因此针对该任务的调查论文并不多。此外，现有的关于Deepfake语音检测任务的调查倾向于总结用于构建Deepfake语音检测系统的技术，而不是提供彻底的分析。这一差距促使我们进行了全面的调查，对Deepfake语音检测的挑战和发展进行了批判性的分析（这项工作是我们的STARLIGHT， EUCINF和DEFAME FAKEs项目的一部分）。我们的调查结构新颖，提供了对当前挑战赛、公共数据集和深度学习技术的深入分析，这些技术为解决该领域现有挑战提供了增强的解决方案。根据我们的分析，我们提出了利用和结合特定深度学习技术来提高Deepfake语音检测系统有效性的假设。除了进行调查之外，我们还进行了大量的实验来验证这些假设，并为Deepfake语音检测任务提出了一个高度竞争的模型。根据分析和实验结果，我们最后指出了Deepfake语音检测任务的潜在和有前途的研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comprehensive survey with critical analysis for deepfake speech detection

Thanks to advancements in deep learning, speech generation systems now power a variety of real-world applications, such as text-to-speech for individuals with speech disorders, voice chatbots in call centers, cross-linguistic speech translation, etc. While these systems can autonomously generate human-like speech and replicate specific voices, they also pose risks when misused for malicious purposes. This motivates the research community to develop models for detecting synthesized speech (e.g., fake speech) generated by deep-learning-based models, referred to as the Deepfake Speech Detection task. As the Deepfake Speech Detection task has emerged in recent years, there are not many survey papers proposed for this task. Additionally, existing surveys for the Deepfake Speech Detection task tend to summarize techniques used to construct a Deepfake Speech Detection system rather than providing a thorough analysis. This gap motivated us to conduct a comprehensive survey, providing a critical analysis of the challenges and developments in Deepfake Speech Detection (This work is a part of our projects of STARLIGHT, EUCINF, and DEFAME FAKEs). Our survey is innovatively structured, offering an in-depth analysis of current challenge competitions, public datasets, and the deep-learning techniques that provide enhanced solutions to address existing challenges in the field. From our analysis, we propose hypotheses on leveraging and combining specific deep learning techniques to improve the effectiveness of Deepfake Speech Detection systems. Beyond conducting a survey, we perform extensive experiments to validate these hypotheses and propose a highly competitive model for the task of Deepfake Speech Detection. Given the analysis and the experimental results, we finally indicate potential and promising research directions for the Deepfake Speech Detection task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Science Review Computer Science-General Computer Science

CiteScore

32.70

自引率

0.00%

发文量

审稿时长

51 days

期刊介绍： Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.