{"title":"A Survey on Speech Deepfake Detection","authors":"Menglu Li, Yasaman Ahmadiadli, Xiao-Ping Zhang","doi":"10.1145/3714458","DOIUrl":null,"url":null,"abstract":"The availability of smart devices leads to an exponential increase in multimedia content. However, advancements in deep learning have also enabled the creation of highly sophisticated deepfake content, including speech Deepfakes, which pose a serious threat by generating realistic voices and spreading misinformation. To combat this, numerous challenges have been organized to advance speech Deepfake detection techniques. In this survey, we systematically analyze more than 200 papers published up to March 2024. We provide a comprehensive review of each component in the detection pipeline, including model architectures, optimization techniques, generalizability, evaluation metrics, performance comparisons, available datasets, and open-source availability. For each aspect, we assess recent progress and discuss ongoing challenges. In addition, we explore emerging topics such as partial Deepfake detection, cross-dataset evaluation, and defences against adversarial attacks, while suggesting promising research directions. This survey not only identifies the current state-of-the-art to establish strong baselines for future experiments but also offers clear guidance for researchers aiming to enhance speech Deepfake detection systems.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"2 1","pages":""},"PeriodicalIF":23.8000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3714458","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The availability of smart devices leads to an exponential increase in multimedia content. However, advancements in deep learning have also enabled the creation of highly sophisticated deepfake content, including speech Deepfakes, which pose a serious threat by generating realistic voices and spreading misinformation. To combat this, numerous challenges have been organized to advance speech Deepfake detection techniques. In this survey, we systematically analyze more than 200 papers published up to March 2024. We provide a comprehensive review of each component in the detection pipeline, including model architectures, optimization techniques, generalizability, evaluation metrics, performance comparisons, available datasets, and open-source availability. For each aspect, we assess recent progress and discuss ongoing challenges. In addition, we explore emerging topics such as partial Deepfake detection, cross-dataset evaluation, and defences against adversarial attacks, while suggesting promising research directions. This survey not only identifies the current state-of-the-art to establish strong baselines for future experiments but also offers clear guidance for researchers aiming to enhance speech Deepfake detection systems.
期刊介绍:
ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods.
ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.