Dengyong Zhang;Ruiyi He;Xin Liao;Feng Li;Jiaxin Chen;Gaobo Yang
{"title":"Face Forgery Detection Based on Fine-Grained Clues and Noise Inconsistency","authors":"Dengyong Zhang;Ruiyi He;Xin Liao;Feng Li;Jiaxin Chen;Gaobo Yang","doi":"10.1109/TAI.2024.3455311","DOIUrl":null,"url":null,"abstract":"Deepfake detection has gained increasing research attention in media forensics, and a variety of works have been produced. However, subtle artifacts might be eliminated by compression, and the convolutional neural networks (CNNs)-based detectors are invalidated for fake face images with compression. In this work, we propose a two-stream network for deepfake detection. We observed that high-frequency noise features and spatial features are inherently complementary to each other. Thus, both spatial features and high-frequency noise features are exploited for face forgery detection. Specifically, we design a double-frequency transformer module (DFTM) to guide the learning of spatial features from local artifact regions. To effectively fuse spatial features and high-frequency noise features, a dual-domain attention fusion module (DDAFM) is designed. We also introduce a local relationship constraint loss, which requires only image-level labels, for model training. We evaluate the proposed approach on five large-scale benchmark datasets, and extensive experimental results demonstrate the proposed approach outperforms most SOTA works.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"144-158"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10669058/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deepfake detection has gained increasing research attention in media forensics, and a variety of works have been produced. However, subtle artifacts might be eliminated by compression, and the convolutional neural networks (CNNs)-based detectors are invalidated for fake face images with compression. In this work, we propose a two-stream network for deepfake detection. We observed that high-frequency noise features and spatial features are inherently complementary to each other. Thus, both spatial features and high-frequency noise features are exploited for face forgery detection. Specifically, we design a double-frequency transformer module (DFTM) to guide the learning of spatial features from local artifact regions. To effectively fuse spatial features and high-frequency noise features, a dual-domain attention fusion module (DDAFM) is designed. We also introduce a local relationship constraint loss, which requires only image-level labels, for model training. We evaluate the proposed approach on five large-scale benchmark datasets, and extensive experimental results demonstrate the proposed approach outperforms most SOTA works.
在媒体取证领域,深度伪造检测受到越来越多的研究关注,各种研究成果层出不穷。然而,压缩可能会消除细微的伪影,基于卷积神经网络(CNN)的检测器在压缩后对假脸图像的检测无效。在这项工作中,我们提出了一种双流网络深度检假技术。我们发现,高频噪声特征和空间特征在本质上是互补的。因此,空间特征和高频噪声特征都可用于人脸伪造检测。具体来说,我们设计了一个双频变压器模块(DFTM)来引导从局部伪造区域学习空间特征。为了有效融合空间特征和高频噪声特征,我们设计了双域注意力融合模块(DDAFM)。我们还为模型训练引入了局部关系约束损失,它只需要图像级标签。我们在五个大型基准数据集上对所提出的方法进行了评估,大量实验结果表明所提出的方法优于大多数 SOTA 作品。