基于中心差分卷积的空间频率深度假检测小波视觉变换

IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing Pub Date : 2025-03-20 DOI:10.1109/OJSP.2025.3571679

Nour Eldin Alaa Badr;Jean-Christophe Nebel;Darrel Greenhill;Xing Liang

{"title":"基于中心差分卷积的空间频率深度假检测小波视觉变换","authors":"Nour Eldin Alaa Badr;Jean-Christophe Nebel;Darrel Greenhill;Xing Liang","doi":"10.1109/OJSP.2025.3571679","DOIUrl":null,"url":null,"abstract":"The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"621-630"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11007485","citationCount":"0","resultStr":"{\"title\":\"WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection\",\"authors\":\"Nour Eldin Alaa Badr;Jean-Christophe Nebel;Darrel Greenhill;Xing Liang\",\"doi\":\"10.1109/OJSP.2025.3571679\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.\",\"PeriodicalId\":73300,\"journal\":{\"name\":\"IEEE open journal of signal processing\",\"volume\":\"6 \",\"pages\":\"621-630\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11007485\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of signal processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11007485/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11007485/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

生成式人工智能的日益普及导致深度假内容的显著增加，迫切需要广义和可靠的深度假检测方法。由于现有的方法要么依赖于空间域特征，要么依赖于频率域特征，它们很难在看不见的数据集上进行泛化，尤其是那些有细微操作的数据集。为了解决这些挑战，设计了一种新颖的端到端小波中心差分卷积视觉变压器框架，以增强空频深度假检测。与以前的方法不同，该方法应用离散小波变换进行多级频率分解和中心差分卷积来捕获局部细粒度差异并关注纹理差异，同时还结合视觉变换进行全局上下文理解。频率-空间特征融合注意模块集成了这些特征，能够有效地检测假文物。此外，与早期的工作相比，引入了空间和频域的细微扰动以进一步提高泛化。综合跨数据集评估表明，在对低质量和高质量人脸图像进行训练时，WaViT-CDC的性能优于最先进的方法，在具有挑战性的高分辨率真实数据集（如Celeb-DF和WildDeepfake）上的平均性能提高了2.5%和4.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

WaViT-CDC: Wavelet Vision Transformer With Central Difference Convolutions for Spatial-Frequency Deepfake Detection

The increasing popularity of generative AI has led to a significant rise in deepfake content, creating an urgent need for generalized and reliable deepfake detection methods. Since existing approaches rely on either spatial-domain features or frequency-domain features, they struggle to generalize across unseen datasets, especially those with subtle manipulations. To address these challenges, a novel end-to-end Wavelet Central Difference Convolutional Vision Transformer framework is designed to enhance spatial-frequency deepfake detection. Unlike previous methods, this approach applies the Discrete Wavelet Transform for multi-level frequency decomposition and Central Difference Convolution to capture local fine-grained discrepancies and focus on texture variances, while also incorporating Vision Transformers for global contextual understanding. The Frequency-Spatial Feature Fusion Attention module integrates these features, enabling the effective detection of fake artifacts. Moreover, in contrast to earlier work, subtle perturbations to both spatial and frequency domains are introduced to further improve generalization. Generalization cross-dataset evaluations demonstrate that WaViT-CDC outperforms state-of-the-art methods, when trained on both low-quality and high-quality face images, achieving an average performance increase of 2.5% and 4.5% on challenging high-resolution, real-world datasets such as Celeb-DF and WildDeepfake.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE open journal of signal processing

CiteScore

5.30

自引率

0.00%

发文量

审稿时长

22 weeks