基于细粒度线索和噪声不一致性的人脸伪造检测

IEEE transactions on artificial intelligence Pub Date : 2024-09-06 DOI:10.1109/TAI.2024.3455311

Dengyong Zhang;Ruiyi He;Xin Liao;Feng Li;Jiaxin Chen;Gaobo Yang

{"title":"基于细粒度线索和噪声不一致性的人脸伪造检测","authors":"Dengyong Zhang;Ruiyi He;Xin Liao;Feng Li;Jiaxin Chen;Gaobo Yang","doi":"10.1109/TAI.2024.3455311","DOIUrl":null,"url":null,"abstract":"Deepfake detection has gained increasing research attention in media forensics, and a variety of works have been produced. However, subtle artifacts might be eliminated by compression, and the convolutional neural networks (CNNs)-based detectors are invalidated for fake face images with compression. In this work, we propose a two-stream network for deepfake detection. We observed that high-frequency noise features and spatial features are inherently complementary to each other. Thus, both spatial features and high-frequency noise features are exploited for face forgery detection. Specifically, we design a double-frequency transformer module (DFTM) to guide the learning of spatial features from local artifact regions. To effectively fuse spatial features and high-frequency noise features, a dual-domain attention fusion module (DDAFM) is designed. We also introduce a local relationship constraint loss, which requires only image-level labels, for model training. We evaluate the proposed approach on five large-scale benchmark datasets, and extensive experimental results demonstrate the proposed approach outperforms most SOTA works.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"144-158"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Face Forgery Detection Based on Fine-Grained Clues and Noise Inconsistency\",\"authors\":\"Dengyong Zhang;Ruiyi He;Xin Liao;Feng Li;Jiaxin Chen;Gaobo Yang\",\"doi\":\"10.1109/TAI.2024.3455311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deepfake detection has gained increasing research attention in media forensics, and a variety of works have been produced. However, subtle artifacts might be eliminated by compression, and the convolutional neural networks (CNNs)-based detectors are invalidated for fake face images with compression. In this work, we propose a two-stream network for deepfake detection. We observed that high-frequency noise features and spatial features are inherently complementary to each other. Thus, both spatial features and high-frequency noise features are exploited for face forgery detection. Specifically, we design a double-frequency transformer module (DFTM) to guide the learning of spatial features from local artifact regions. To effectively fuse spatial features and high-frequency noise features, a dual-domain attention fusion module (DDAFM) is designed. We also introduce a local relationship constraint loss, which requires only image-level labels, for model training. We evaluate the proposed approach on five large-scale benchmark datasets, and extensive experimental results demonstrate the proposed approach outperforms most SOTA works.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 1\",\"pages\":\"144-158\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10669058/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10669058/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在媒体取证领域，深度伪造检测受到越来越多的研究关注，各种研究成果层出不穷。然而，压缩可能会消除细微的伪影，基于卷积神经网络（CNN）的检测器在压缩后对假脸图像的检测无效。在这项工作中，我们提出了一种双流网络深度检假技术。我们发现，高频噪声特征和空间特征在本质上是互补的。因此，空间特征和高频噪声特征都可用于人脸伪造检测。具体来说，我们设计了一个双频变压器模块（DFTM）来引导从局部伪造区域学习空间特征。为了有效融合空间特征和高频噪声特征，我们设计了双域注意力融合模块（DDAFM）。我们还为模型训练引入了局部关系约束损失，它只需要图像级标签。我们在五个大型基准数据集上对所提出的方法进行了评估，大量实验结果表明所提出的方法优于大多数 SOTA 作品。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Face Forgery Detection Based on Fine-Grained Clues and Noise Inconsistency

Deepfake detection has gained increasing research attention in media forensics, and a variety of works have been produced. However, subtle artifacts might be eliminated by compression, and the convolutional neural networks (CNNs)-based detectors are invalidated for fake face images with compression. In this work, we propose a two-stream network for deepfake detection. We observed that high-frequency noise features and spatial features are inherently complementary to each other. Thus, both spatial features and high-frequency noise features are exploited for face forgery detection. Specifically, we design a double-frequency transformer module (DFTM) to guide the learning of spatial features from local artifact regions. To effectively fuse spatial features and high-frequency noise features, a dual-domain attention fusion module (DDAFM) is designed. We also introduce a local relationship constraint loss, which requires only image-level labels, for model training. We evaluate the proposed approach on five large-scale benchmark datasets, and extensive experimental results demonstrate the proposed approach outperforms most SOTA works.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量