A Hybrid Deep Learning Framework for Deepfake Detection Using Temporal and Spatial Features

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-04-30 DOI:10.1109/ACCESS.2025.3566008

Fazeel Zafar;Talha Ahmed Khan;Salas Akbar;Muhammad Talha Ubaid;Sameena Javaid;Kushsairy Abdul Kadir

{"title":"A Hybrid Deep Learning Framework for Deepfake Detection Using Temporal and Spatial Features","authors":"Fazeel Zafar;Talha Ahmed Khan;Salas Akbar;Muhammad Talha Ubaid;Sameena Javaid;Kushsairy Abdul Kadir","doi":"10.1109/ACCESS.2025.3566008","DOIUrl":null,"url":null,"abstract":"The rise of deep-fake technology has sparked concerns as it blurs the distinction between fake media by harnessing Generative Adversarial Networks (GANs). This has raised issues surrounding privacy and security in the realm. This has led to a decrease in trust during online interactions; thus, emphasizing the importance of creating reliable methods for detection purposes. Our research introduces a model for detecting deepfakes by utilizing an Enhanced EfficientNet B0 structure in conjunction with Temporal Convolutional Neural Networks (TempCNNs). This approach aims to tackle the challenges presented by the evolving sophistication of deep-fake techniques. The system dissects video inputs into frames to extract features comprehensively by using Multi Test Convolutional Networks (MTCNN). This method ensures face detection and alignment by focusing on facial regions. To enhance the model’s adaptability, to different scenarios and datasets we implement data augmentation techniques such as CutMix, MixUp and Random Erasing. These strategies help the model maintain its strength, against distortions found in deepfake content. The backbone of EfficientNet B0 utilizes Mobile Inverted Bottleneck Convolutions (MBConv) and Squeeze and Excitation (SE) blocks to enhance feature extraction by adjusting channels to highlight details effectively. A Feature Pyramid Network (FPN) facilitates the fusion of scale features capturing intricate details as well, as broader context. When tested on the FFIW 10 K dataset, which comprises 10,000 videos evenly split between manipulated content, the model attained a training accuracy of 91.5 % and a testing accuracy of 92.45 %, after 40 epochs. The findings showcase the model’s proficiency, in identifying videos with precision and tackling the issue of class imbalances found in datasets – a valuable contribution, to advancing dependable deepfake detection solutions. Furthermore, the model achieves an impressive balance between accuracy and computational efficiency, attaining 92.45% testing accuracy with a lightweight computational cost of 0.45 GFLOPs, making it a highly practical choice for real-world deployment.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"79560-79570"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10981422","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10981422/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The rise of deep-fake technology has sparked concerns as it blurs the distinction between fake media by harnessing Generative Adversarial Networks (GANs). This has raised issues surrounding privacy and security in the realm. This has led to a decrease in trust during online interactions; thus, emphasizing the importance of creating reliable methods for detection purposes. Our research introduces a model for detecting deepfakes by utilizing an Enhanced EfficientNet B0 structure in conjunction with Temporal Convolutional Neural Networks (TempCNNs). This approach aims to tackle the challenges presented by the evolving sophistication of deep-fake techniques. The system dissects video inputs into frames to extract features comprehensively by using Multi Test Convolutional Networks (MTCNN). This method ensures face detection and alignment by focusing on facial regions. To enhance the model’s adaptability, to different scenarios and datasets we implement data augmentation techniques such as CutMix, MixUp and Random Erasing. These strategies help the model maintain its strength, against distortions found in deepfake content. The backbone of EfficientNet B0 utilizes Mobile Inverted Bottleneck Convolutions (MBConv) and Squeeze and Excitation (SE) blocks to enhance feature extraction by adjusting channels to highlight details effectively. A Feature Pyramid Network (FPN) facilitates the fusion of scale features capturing intricate details as well, as broader context. When tested on the FFIW 10 K dataset, which comprises 10,000 videos evenly split between manipulated content, the model attained a training accuracy of 91.5 % and a testing accuracy of 92.45 %, after 40 epochs. The findings showcase the model’s proficiency, in identifying videos with precision and tackling the issue of class imbalances found in datasets – a valuable contribution, to advancing dependable deepfake detection solutions. Furthermore, the model achieves an impressive balance between accuracy and computational efficiency, attaining 92.45% testing accuracy with a lightweight computational cost of 0.45 GFLOPs, making it a highly practical choice for real-world deployment.

查看原文本刊更多论文

一种基于时空特征的深度伪造检测混合深度学习框架

深度假技术的兴起引发了人们的担忧，因为它通过利用生成对抗网络（gan）模糊了假媒体之间的区别。这引发了有关该领域隐私和安全的问题。这导致了在线互动中信任的减少；因此，强调为检测目的建立可靠方法的重要性。我们的研究介绍了一种利用增强型高效网B0结构与时间卷积神经网络（TempCNNs）相结合来检测深度伪造的模型。这种方法旨在解决深度伪造技术不断发展的复杂性所带来的挑战。该系统利用多测试卷积网络（Multi - Test Convolutional Networks， MTCNN）对视频输入进行帧分解，全面提取特征。该方法通过聚焦面部区域来确保人脸检测和对齐。为了增强模型对不同场景和数据集的适应性，我们实现了CutMix、MixUp和Random erase等数据增强技术。这些策略有助于模型保持其强度，防止在深度虚假内容中发现扭曲现象。高效网B0的主干利用移动倒瓶颈卷积（MBConv）和挤压激励（SE）块，通过调整通道来有效地突出细节，从而增强特征提取。特征金字塔网络（FPN）促进了捕获复杂细节的尺度特征的融合，以及更广泛的背景。当在FFIW 10k数据集上进行测试时，该数据集包括10000个视频，这些视频均匀地分布在被操纵的内容之间，经过40次迭代，该模型的训练准确率达到91.5%，测试准确率达到92.45%。研究结果显示了该模型在精确识别视频和解决数据集中发现的类别不平衡问题方面的熟练程度——这是对推进可靠的深度伪造检测解决方案的宝贵贡献。此外，该模型在精度和计算效率之间取得了令人印象深刻的平衡，以0.45 GFLOPs的轻量级计算成本达到了92.45%的测试精度，使其成为实际部署的高度实用选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.