Deepfake detection: Enhancing performance with spatiotemporal texture and deep learning feature fusion

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Egyptian Informatics Journal Pub Date : 2024-09-01 DOI:10.1016/j.eij.2024.100535

Abdelwahab Almestekawy , Hala H. Zayed , Ahmed Taha

{"title":"Deepfake detection: Enhancing performance with spatiotemporal texture and deep learning feature fusion","authors":"Abdelwahab Almestekawy , Hala H. Zayed , Ahmed Taha","doi":"10.1016/j.eij.2024.100535","DOIUrl":null,"url":null,"abstract":"<div><p>Deepfakes bring critical ethical issues about consent, authenticity, and the manipulation of digital content. Identifying Deepfake videos is one step towards fighting their malicious uses. While the previous works introduced accurate methods for Deepfake detection, the stability of the proposed methods is rarely discussed. The problem statement of this paper is to build a stable model for Deepfake detection. The results of the model should be reproducible. In other words, if other researchers repeat the same experiments, the results should not differ. The proposed technique combines multiple spatiotemporal textures and deep learning-based features. An enhanced 3D Convolutional Neural Network, which contains a spatiotemporal attention layer, is utilized in a Siamese architecture. Various analyses are carried out on the control parameters, feature importance, and reproducibility of results. Our technique is tested on four datasets: Celeb-DF, FaceForensics++, DeepfakeTIMIT, and FaceShifter. The results demonstrate that a Siamese architecture can improve the accuracy of 3D Convolutional Neural Networks by 7.9 % and reduce the standard deviation of accuracy to 0.016, which indicates reproducible results. Furthermore, adding texture features enhances accuracy by up to 91.96 %. The final model can achieve an Area Under Curve (AUC) up to 97.51 % and 95.44 % in same-dataset and cross-dataset scenarios, respectively. The main contributions of this work are the enhancement of model stability and the assurance of result repeatability, ensuring consistent results with high accuracy.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524000987/pdfft?md5=df12b9858a677adbb12325d75e4f6a78&pid=1-s2.0-S1110866524000987-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524000987","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deepfakes bring critical ethical issues about consent, authenticity, and the manipulation of digital content. Identifying Deepfake videos is one step towards fighting their malicious uses. While the previous works introduced accurate methods for Deepfake detection, the stability of the proposed methods is rarely discussed. The problem statement of this paper is to build a stable model for Deepfake detection. The results of the model should be reproducible. In other words, if other researchers repeat the same experiments, the results should not differ. The proposed technique combines multiple spatiotemporal textures and deep learning-based features. An enhanced 3D Convolutional Neural Network, which contains a spatiotemporal attention layer, is utilized in a Siamese architecture. Various analyses are carried out on the control parameters, feature importance, and reproducibility of results. Our technique is tested on four datasets: Celeb-DF, FaceForensics++, DeepfakeTIMIT, and FaceShifter. The results demonstrate that a Siamese architecture can improve the accuracy of 3D Convolutional Neural Networks by 7.9 % and reduce the standard deviation of accuracy to 0.016, which indicates reproducible results. Furthermore, adding texture features enhances accuracy by up to 91.96 %. The final model can achieve an Area Under Curve (AUC) up to 97.51 % and 95.44 % in same-dataset and cross-dataset scenarios, respectively. The main contributions of this work are the enhancement of model stability and the assurance of result repeatability, ensuring consistent results with high accuracy.

查看原文本刊更多论文

深度伪造检测：利用时空纹理和深度学习特征融合提高性能

Deepfakes 带来了有关同意、真实性和操纵数字内容的重要伦理问题。识别 Deepfake 视频是打击其恶意使用的一个步骤。虽然之前的研究提出了准确的 Deepfake 检测方法，但很少有人讨论所提出方法的稳定性。本文的问题陈述是为 Deepfake 检测建立一个稳定的模型。该模型的结果应具有可重复性。换句话说，如果其他研究人员重复同样的实验，结果不应有差异。本文提出的技术结合了多种时空纹理和基于深度学习的特征。增强型三维卷积神经网络包含一个时空注意力层，采用连体结构。对控制参数、特征重要性和结果的可重复性进行了各种分析。我们的技术在四个数据集上进行了测试：Celeb-DF、FaceForensics++、DeepfakeTIMIT 和 FaceShifter。结果表明，连体架构可将三维卷积神经网络的准确率提高 7.9%，并将准确率的标准偏差降至 0.016，这表明结果具有可重复性。此外，添加纹理特征可使准确率提高 91.96%。在相同数据集和跨数据集情况下，最终模型的曲线下面积（AUC）分别达到 97.51 % 和 95.44 %。这项工作的主要贡献在于增强了模型的稳定性，保证了结果的可重复性，确保了结果的一致性和高准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research

CiteScore

11.10

自引率

1.90%

发文量

审稿时长

110 days

期刊介绍： The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.