Dengyong Zhang;Ruiyi He;Xin Liao;Feng Li;Jiaxin Chen;Gaobo Yang
{"title":"基于细粒度线索和噪声不一致性的人脸伪造检测","authors":"Dengyong Zhang;Ruiyi He;Xin Liao;Feng Li;Jiaxin Chen;Gaobo Yang","doi":"10.1109/TAI.2024.3455311","DOIUrl":null,"url":null,"abstract":"Deepfake detection has gained increasing research attention in media forensics, and a variety of works have been produced. However, subtle artifacts might be eliminated by compression, and the convolutional neural networks (CNNs)-based detectors are invalidated for fake face images with compression. In this work, we propose a two-stream network for deepfake detection. We observed that high-frequency noise features and spatial features are inherently complementary to each other. Thus, both spatial features and high-frequency noise features are exploited for face forgery detection. Specifically, we design a double-frequency transformer module (DFTM) to guide the learning of spatial features from local artifact regions. To effectively fuse spatial features and high-frequency noise features, a dual-domain attention fusion module (DDAFM) is designed. We also introduce a local relationship constraint loss, which requires only image-level labels, for model training. We evaluate the proposed approach on five large-scale benchmark datasets, and extensive experimental results demonstrate the proposed approach outperforms most SOTA works.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"144-158"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Face Forgery Detection Based on Fine-Grained Clues and Noise Inconsistency\",\"authors\":\"Dengyong Zhang;Ruiyi He;Xin Liao;Feng Li;Jiaxin Chen;Gaobo Yang\",\"doi\":\"10.1109/TAI.2024.3455311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deepfake detection has gained increasing research attention in media forensics, and a variety of works have been produced. However, subtle artifacts might be eliminated by compression, and the convolutional neural networks (CNNs)-based detectors are invalidated for fake face images with compression. In this work, we propose a two-stream network for deepfake detection. We observed that high-frequency noise features and spatial features are inherently complementary to each other. Thus, both spatial features and high-frequency noise features are exploited for face forgery detection. Specifically, we design a double-frequency transformer module (DFTM) to guide the learning of spatial features from local artifact regions. To effectively fuse spatial features and high-frequency noise features, a dual-domain attention fusion module (DDAFM) is designed. We also introduce a local relationship constraint loss, which requires only image-level labels, for model training. We evaluate the proposed approach on five large-scale benchmark datasets, and extensive experimental results demonstrate the proposed approach outperforms most SOTA works.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 1\",\"pages\":\"144-158\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10669058/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10669058/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在媒体取证领域,深度伪造检测受到越来越多的研究关注,各种研究成果层出不穷。然而,压缩可能会消除细微的伪影,基于卷积神经网络(CNN)的检测器在压缩后对假脸图像的检测无效。在这项工作中,我们提出了一种双流网络深度检假技术。我们发现,高频噪声特征和空间特征在本质上是互补的。因此,空间特征和高频噪声特征都可用于人脸伪造检测。具体来说,我们设计了一个双频变压器模块(DFTM)来引导从局部伪造区域学习空间特征。为了有效融合空间特征和高频噪声特征,我们设计了双域注意力融合模块(DDAFM)。我们还为模型训练引入了局部关系约束损失,它只需要图像级标签。我们在五个大型基准数据集上对所提出的方法进行了评估,大量实验结果表明所提出的方法优于大多数 SOTA 作品。
Face Forgery Detection Based on Fine-Grained Clues and Noise Inconsistency
Deepfake detection has gained increasing research attention in media forensics, and a variety of works have been produced. However, subtle artifacts might be eliminated by compression, and the convolutional neural networks (CNNs)-based detectors are invalidated for fake face images with compression. In this work, we propose a two-stream network for deepfake detection. We observed that high-frequency noise features and spatial features are inherently complementary to each other. Thus, both spatial features and high-frequency noise features are exploited for face forgery detection. Specifically, we design a double-frequency transformer module (DFTM) to guide the learning of spatial features from local artifact regions. To effectively fuse spatial features and high-frequency noise features, a dual-domain attention fusion module (DDAFM) is designed. We also introduce a local relationship constraint loss, which requires only image-level labels, for model training. We evaluate the proposed approach on five large-scale benchmark datasets, and extensive experimental results demonstrate the proposed approach outperforms most SOTA works.