One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework

Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442381.3449809

Shahroz Tariq, Sangyup Lee, Simon S. Woo

{"title":"One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework","authors":"Shahroz Tariq, Sangyup Lee, Simon S. Woo","doi":"10.1145/3442381.3449809","DOIUrl":null,"url":null,"abstract":"Deep learning-based video manipulation methods have become widely accessible to the masses. With little to no effort, people can quickly learn how to generate deepfake (DF) videos. While deep learning-based detection methods have been proposed to identify specific types of DFs, their performance suffers for other types of deepfake methods, including real-world deepfakes, on which they are not sufficiently trained. In other words, most of the proposed deep learning-based detection methods lack transferability and generalizability. Beyond detecting a single type of DF from benchmark deepfake datasets, we focus on developing a generalized approach to detect multiple types of DFs, including deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW) videos. To better cope with unknown and unseen deepfakes, we introduce a Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model training strategy and explores spatial as well as the temporal information in a deepfakes. Through extensive experiments, we show that existing defense methods are not ready for real-world deployment. Whereas our defense method (CLRNet) achieves far better generalization when detecting various benchmark deepfake methods (97.57% on average). Furthermore, we evaluate our approach with a high-quality DeepFake-in-the-Wild dataset, collected from the Internet containing numerous videos and having more than 150,000 frames. Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy, outperforming existing state-of-the-art defense methods by a considerable margin.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 51

Abstract

Deep learning-based video manipulation methods have become widely accessible to the masses. With little to no effort, people can quickly learn how to generate deepfake (DF) videos. While deep learning-based detection methods have been proposed to identify specific types of DFs, their performance suffers for other types of deepfake methods, including real-world deepfakes, on which they are not sufficiently trained. In other words, most of the proposed deep learning-based detection methods lack transferability and generalizability. Beyond detecting a single type of DF from benchmark deepfake datasets, we focus on developing a generalized approach to detect multiple types of DFs, including deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW) videos. To better cope with unknown and unseen deepfakes, we introduce a Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model training strategy and explores spatial as well as the temporal information in a deepfakes. Through extensive experiments, we show that existing defense methods are not ready for real-world deployment. Whereas our defense method (CLRNet) achieves far better generalization when detecting various benchmark deepfake methods (97.57% on average). Furthermore, we evaluate our approach with a high-quality DeepFake-in-the-Wild dataset, collected from the Internet containing numerous videos and having more than 150,000 frames. Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy, outperforming existing state-of-the-art defense methods by a considerable margin.

查看原文本刊更多论文

一个检测器统治所有:走向一个通用的深度伪造攻击检测框架

基于深度学习的视频处理方法已经被大众广泛使用。人们可以毫不费力地快速学习如何生成深度造假(DF)视频。虽然已经提出了基于深度学习的检测方法来识别特定类型的df，但它们的性能在其他类型的深度伪造方法中受到影响，包括现实世界的深度伪造，因为它们没有得到充分的训练。换句话说，大多数提出的基于深度学习的检测方法缺乏可转移性和泛化性。除了从基准deepfake数据集检测单一类型的DF之外，我们还专注于开发一种通用的方法来检测多种类型的DF，包括来自未知生成方法的深度伪造，例如deepfake -in- wild (DFW)视频。为了更好地处理未知和不可见的深度伪造，我们引入了一种基于卷积lstm的残差网络(CLRNet)，该网络采用独特的模型训练策略，探索深度伪造中的空间和时间信息。通过大量的实验，我们表明现有的防御方法还没有为现实世界的部署做好准备。而我们的防御方法(CLRNet)在检测各种基准深度伪造方法时实现了更好的泛化(平均为97.57%)。此外，我们用一个高质量的DeepFake-in-the-Wild数据集来评估我们的方法，该数据集收集自互联网，包含大量视频，超过15万帧。我们的CLRNet模型表明，通过达到93.86%的检测准确率，它可以很好地泛化高质量的DFW视频，大大优于现有的最先进的防御方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Web Conference 2021

自引率

0.00%

发文量