利用增强型 Resnet-18 和多层 CNN 最大池进行高级深度伪造检测

The Visual Computer Pub Date : 2024-09-18 DOI:10.1007/s00371-024-03613-x

Muhammad Fahad, Tao Zhang, Yasir Iqbal, Azaz Ikram, Fazeela Siddiqui, Bin Younas Abdullah, Malik Muhammad Nauman, Xin Zhao, Yanzhang Geng

{"title":"利用增强型 Resnet-18 和多层 CNN 最大池进行高级深度伪造检测","authors":"Muhammad Fahad, Tao Zhang, Yasir Iqbal, Azaz Ikram, Fazeela Siddiqui, Bin Younas Abdullah, Malik Muhammad Nauman, Xin Zhao, Yanzhang Geng","doi":"10.1007/s00371-024-03613-x","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence has revolutionized technology, with generative adversarial networks (GANs) generating fake samples and deepfake videos. These technologies can lead to panic and instability, allowing anyone to produce propaganda. Therefore, it is crucial to develop a robust system to distinguish between authentic and counterfeit information in the current social media era. This study offers an automated approach for categorizing deepfake videos using advanced machine learning and deep learning techniques. The processed videos are classified using a deep learning-based enhanced Resnet-18 with convolutional neural network (CNN) multilayer max pooling. This research contributes to studying precise detection techniques for deepfake technology, which is gradually becoming a serious problem for digital media. The proposed enhanced Resnet-18 CNN method integrates deep learning algorithms on GAN architecture and artificial intelligence-generated videos to analyze and determine genuine and fake videos. In this research, we fuse the sub-datasets (faceswap, face2face, deepfakes, neural textures) of FaceForensics, CelebDF, DeeperForensics, DeepFake detection and our own created private dataset into one combined dataset, and the total number of videos are (11,404) in this fused dataset. The dataset on which it was trained has a diverse range of videos and sentiments, demonstrating its capability. The structure of the model is designed to predict and identify videos with faces accurately switched as fakes, while those without switches are real. This paper is a great leap forward in the area of digital forensics, providing an excellent response to deepfakes. The proposed model outperformed conventional methods in predicting video frames, with an accuracy score of 99.99%, F-score of 99.98%, recall of 100%, and precision of 99.99%, confirming its effectiveness through a comparative analysis. The source code of this study is available publically at https://doi.org/10.5281/zenodo.12538330.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling\",\"authors\":\"Muhammad Fahad, Tao Zhang, Yasir Iqbal, Azaz Ikram, Fazeela Siddiqui, Bin Younas Abdullah, Malik Muhammad Nauman, Xin Zhao, Yanzhang Geng\",\"doi\":\"10.1007/s00371-024-03613-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Artificial intelligence has revolutionized technology, with generative adversarial networks (GANs) generating fake samples and deepfake videos. These technologies can lead to panic and instability, allowing anyone to produce propaganda. Therefore, it is crucial to develop a robust system to distinguish between authentic and counterfeit information in the current social media era. This study offers an automated approach for categorizing deepfake videos using advanced machine learning and deep learning techniques. The processed videos are classified using a deep learning-based enhanced Resnet-18 with convolutional neural network (CNN) multilayer max pooling. This research contributes to studying precise detection techniques for deepfake technology, which is gradually becoming a serious problem for digital media. The proposed enhanced Resnet-18 CNN method integrates deep learning algorithms on GAN architecture and artificial intelligence-generated videos to analyze and determine genuine and fake videos. In this research, we fuse the sub-datasets (faceswap, face2face, deepfakes, neural textures) of FaceForensics, CelebDF, DeeperForensics, DeepFake detection and our own created private dataset into one combined dataset, and the total number of videos are (11,404) in this fused dataset. The dataset on which it was trained has a diverse range of videos and sentiments, demonstrating its capability. The structure of the model is designed to predict and identify videos with faces accurately switched as fakes, while those without switches are real. This paper is a great leap forward in the area of digital forensics, providing an excellent response to deepfakes. The proposed model outperformed conventional methods in predicting video frames, with an accuracy score of 99.99%, F-score of 99.98%, recall of 100%, and precision of 99.99%, confirming its effectiveness through a comparative analysis. The source code of this study is available publically at https://doi.org/10.5281/zenodo.12538330.</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"33 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03613-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03613-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人工智能带来了技术革命，生成式对抗网络（GAN）可以生成虚假样本和深度伪造视频。这些技术可导致恐慌和不稳定，使任何人都能制作宣传品。因此，在当前的社交媒体时代，开发一个强大的系统来区分真假信息至关重要。本研究提供了一种利用先进的机器学习和深度学习技术对深度伪造视频进行分类的自动化方法。使用基于深度学习的增强型 Resnet-18 和卷积神经网络（CNN）多层最大集合对处理过的视频进行分类。这项研究有助于研究深度伪造技术的精确检测技术，该技术正逐渐成为数字媒体的一个严重问题。所提出的增强型 Resnet-18 CNN 方法整合了 GAN 架构上的深度学习算法和人工智能生成的视频，以分析和判断真假视频。在这项研究中，我们将 FaceForensics、CelebDF、DeeperForensics、DeepFake 检测的子数据集（faceswap、face2face、deepfakes、neural textures）和我们自己创建的私有数据集融合为一个组合数据集，在这个融合数据集中的视频总数为（11404）。训练该模型的数据集包含各种视频和情绪，这充分证明了该模型的能力。该模型的结构旨在预测和识别人脸被准确切换的视频为假视频，而没有切换的视频为真视频。本文是数字取证领域的一大飞跃，为深度伪造提供了出色的应对措施。所提出的模型在预测视频帧方面优于传统方法，准确率达 99.99%，F 值达 99.98%，召回率达 100%，精确率达 99.99%，通过对比分析证实了其有效性。本研究的源代码可在 https://doi.org/10.5281/zenodo.12538330 网站上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling

查看原文本刊更多论文

Advanced deepfake detection with enhanced Resnet-18 and multilayer CNN max pooling

Artificial intelligence has revolutionized technology, with generative adversarial networks (GANs) generating fake samples and deepfake videos. These technologies can lead to panic and instability, allowing anyone to produce propaganda. Therefore, it is crucial to develop a robust system to distinguish between authentic and counterfeit information in the current social media era. This study offers an automated approach for categorizing deepfake videos using advanced machine learning and deep learning techniques. The processed videos are classified using a deep learning-based enhanced Resnet-18 with convolutional neural network (CNN) multilayer max pooling. This research contributes to studying precise detection techniques for deepfake technology, which is gradually becoming a serious problem for digital media. The proposed enhanced Resnet-18 CNN method integrates deep learning algorithms on GAN architecture and artificial intelligence-generated videos to analyze and determine genuine and fake videos. In this research, we fuse the sub-datasets (faceswap, face2face, deepfakes, neural textures) of FaceForensics, CelebDF, DeeperForensics, DeepFake detection and our own created private dataset into one combined dataset, and the total number of videos are (11,404) in this fused dataset. The dataset on which it was trained has a diverse range of videos and sentiments, demonstrating its capability. The structure of the model is designed to predict and identify videos with faces accurately switched as fakes, while those without switches are real. This paper is a great leap forward in the area of digital forensics, providing an excellent response to deepfakes. The proposed model outperformed conventional methods in predicting video frames, with an accuracy score of 99.99%, F-score of 99.98%, recall of 100%, and precision of 99.99%, confirming its effectiveness through a comparative analysis. The source code of this study is available publically at https://doi.org/10.5281/zenodo.12538330.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量