{"title":"Multi-domain awareness for compressed deepfake videos detection over social networks guided by common mechanisms between artifacts","authors":"","doi":"10.1016/j.cviu.2024.104072","DOIUrl":null,"url":null,"abstract":"<div><p>The viral spread of massive deepfake videos over social networks has caused serious security problems. Despite the remarkable advancements achieved by existing deepfake detection algorithms, deepfake videos over social networks are inevitably influenced by compression factors. This causes deepfake detection performance to be limited by the following challenging issues: (a) interfering with compression artifacts, (b) loss of feature information, and (c) aliasing of feature distributions. In this paper, we analyze the common mechanism between compression artifacts and deepfake artifacts, revealing the structural similarity between them and providing a reliable theoretical basis for enhancing the robustness of deepfake detection models against compression. Firstly, based on the common mechanism between artifacts, we design a frequency domain adaptive notch filter to eliminate the interference of compression artifacts on specific frequency bands. Secondly, to reduce the sensitivity of deepfake detection models to unknown noise, we propose a spatial residual denoising strategy. Thirdly, to exploit the intrinsic correlation between feature vectors in the frequency domain branch and the spatial domain branch, we enhance deepfake features using an attention-based feature fusion method. Finally, we adopt a multi-task decision approach to enhance the discriminative power of the latent space representation of deepfakes, achieving deepfake detection with robustness against compression. Extensive experiments show that compared with the baseline methods, the detection performance of the proposed algorithm on compressed deepfake videos has been significantly improved. In particular, our model is resistant to various types of noise disturbances and can be easily combined with baseline detection models to improve their robustness.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S107731422400153X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The viral spread of massive deepfake videos over social networks has caused serious security problems. Despite the remarkable advancements achieved by existing deepfake detection algorithms, deepfake videos over social networks are inevitably influenced by compression factors. This causes deepfake detection performance to be limited by the following challenging issues: (a) interfering with compression artifacts, (b) loss of feature information, and (c) aliasing of feature distributions. In this paper, we analyze the common mechanism between compression artifacts and deepfake artifacts, revealing the structural similarity between them and providing a reliable theoretical basis for enhancing the robustness of deepfake detection models against compression. Firstly, based on the common mechanism between artifacts, we design a frequency domain adaptive notch filter to eliminate the interference of compression artifacts on specific frequency bands. Secondly, to reduce the sensitivity of deepfake detection models to unknown noise, we propose a spatial residual denoising strategy. Thirdly, to exploit the intrinsic correlation between feature vectors in the frequency domain branch and the spatial domain branch, we enhance deepfake features using an attention-based feature fusion method. Finally, we adopt a multi-task decision approach to enhance the discriminative power of the latent space representation of deepfakes, achieving deepfake detection with robustness against compression. Extensive experiments show that compared with the baseline methods, the detection performance of the proposed algorithm on compressed deepfake videos has been significantly improved. In particular, our model is resistant to various types of noise disturbances and can be easily combined with baseline detection models to improve their robustness.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems