Li Wang, Xiangtao Meng, Dan Li, Xuhong Zhang, Shouling Ji, Shanqing Guo
{"title":"DEEPFAKER: A Unified Evaluation Platform for Facial Deepfake and Detection Models","authors":"Li Wang, Xiangtao Meng, Dan Li, Xuhong Zhang, Shouling Ji, Shanqing Guo","doi":"10.1145/3634914","DOIUrl":null,"url":null,"abstract":"<p>DeepFake data contains realistically manipulated faces - its abuses pose a huge threat to the security and privacy-critical applications. Intensive research from academia and industry has produced many deepfake/detection models, leading to a constant race of attack and defense. However, due to the lack of a unified evaluation platform, many critical questions on this subject remain largely unexplored. <i>(i)</i> How is the anti-detection ability of the existing deepfake models? <i>(ii)</i> How generalizable are existing detection models against different deepfake samples? <i>(iii)</i> How effective are the detection APIs provided by the cloud-based vendors? <i>(iv)</i> How evasive and transferable are adversarial deepfakes in the lab and real-world environment? <i>(v)</i> How do various factors impact the performance of deepfake and detection models? </p><p>To bridge the gap, we design and implement <monospace>DEEPFAKER</monospace>, a unified and comprehensive deepfake-detection evaluation platform. Specifically, <monospace>DEEPFAKER</monospace> has integrated 10 state-of-the-art deepfake methods and 9 representative detection methods, while providing a user-friendly interface and modular design that allows for easy integration of new methods. Leveraging <monospace>DEEPFAKER</monospace>, we conduct a large-scale empirical study of facial deepfake/detection models and draw a set of key findings: <i>(i)</i> the detection methods have poor generalization on samples generated by different deepfake methods; <i>(ii)</i> there is no significant correlation between anti-detection ability and visual quality of deepfake samples; <i>(iii)</i> the current detection APIs have poor detection performance and adversarial deepfakes can achieve about 70% ASR (attack success rate) on all cloud-based vendors, calling for an urgent need to deploy effective and robust detection APIs; <i>(iv)</i> the detection methods in the lab are more robust against transfer attacks than the detection APIs in the real-world environment; <i>(v)</i> deepfake videos may not always be more difficult to detect after video compression. We envision that <monospace>DEEPFAKER</monospace> will benefit future research on facial deepfake and detection.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3634914","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
DeepFake data contains realistically manipulated faces - its abuses pose a huge threat to the security and privacy-critical applications. Intensive research from academia and industry has produced many deepfake/detection models, leading to a constant race of attack and defense. However, due to the lack of a unified evaluation platform, many critical questions on this subject remain largely unexplored. (i) How is the anti-detection ability of the existing deepfake models? (ii) How generalizable are existing detection models against different deepfake samples? (iii) How effective are the detection APIs provided by the cloud-based vendors? (iv) How evasive and transferable are adversarial deepfakes in the lab and real-world environment? (v) How do various factors impact the performance of deepfake and detection models?
To bridge the gap, we design and implement DEEPFAKER, a unified and comprehensive deepfake-detection evaluation platform. Specifically, DEEPFAKER has integrated 10 state-of-the-art deepfake methods and 9 representative detection methods, while providing a user-friendly interface and modular design that allows for easy integration of new methods. Leveraging DEEPFAKER, we conduct a large-scale empirical study of facial deepfake/detection models and draw a set of key findings: (i) the detection methods have poor generalization on samples generated by different deepfake methods; (ii) there is no significant correlation between anti-detection ability and visual quality of deepfake samples; (iii) the current detection APIs have poor detection performance and adversarial deepfakes can achieve about 70% ASR (attack success rate) on all cloud-based vendors, calling for an urgent need to deploy effective and robust detection APIs; (iv) the detection methods in the lab are more robust against transfer attacks than the detection APIs in the real-world environment; (v) deepfake videos may not always be more difficult to detect after video compression. We envision that DEEPFAKER will benefit future research on facial deepfake and detection.