Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning

IF 11.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2024-08-12 DOI:10.1007/s11263-024-02184-7

Zhimin Sun, Shen Chen, Taiping Yao, Ran Yi, Shouhong Ding, Lizhuang Ma

{"title":"Rethinking Open-World DeepFake Attribution with Multi-perspective Sensory Learning","authors":"Zhimin Sun, Shen Chen, Taiping Yao, Ran Yi, Shouhong Ding, Lizhuang Ma","doi":"10.1007/s11263-024-02184-7","DOIUrl":null,"url":null,"abstract":"<p>The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or diffusion models are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces remain under-explored. To push the related frontier research, we introduce a novel task named Open-World DeepFake Attribution, and the corresponding benchmark OW-DFA++, which aims to evaluate attribution performance against various types of fake faces in open-world scenarios. Meanwhile, we propose a Multi-Perspective Sensory Learning (MPSL) framework that aims to address the challenge of OW-DFA++. Since different forged faces have different tampering regions and frequency artifacts, we introduce the Multi-Perception Voting (MPV) module, which aligns inter-sample features based on global, multi-scale local, and frequency relations. The MPV module effectively filters and groups together samples belonging to the same attack type. Pseudo-labeling is another common and effective strategy in semi-supervised learning tasks, and we propose the Confidence-Adaptive Pseudo-labeling (CAP) module, using soft pseudo-labeling to enhance the class compactness and mitigate pseudo-noise induced by similar novel attack methods. The CAP module imposes strong constraints and adaptively filters samples with high uncertainty to improve the accuracy of the pseudo-labeling. In addition, we extend the MPSL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments and visualizations verify the superiority of our proposed method on the OW-DFA++ and demonstrate the interpretability of the deepfake attribution task and its impact on improving the security of the deepfake detection area.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"191 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-024-02184-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or diffusion models are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces remain under-explored. To push the related frontier research, we introduce a novel task named Open-World DeepFake Attribution, and the corresponding benchmark OW-DFA++, which aims to evaluate attribution performance against various types of fake faces in open-world scenarios. Meanwhile, we propose a Multi-Perspective Sensory Learning (MPSL) framework that aims to address the challenge of OW-DFA++. Since different forged faces have different tampering regions and frequency artifacts, we introduce the Multi-Perception Voting (MPV) module, which aligns inter-sample features based on global, multi-scale local, and frequency relations. The MPV module effectively filters and groups together samples belonging to the same attack type. Pseudo-labeling is another common and effective strategy in semi-supervised learning tasks, and we propose the Confidence-Adaptive Pseudo-labeling (CAP) module, using soft pseudo-labeling to enhance the class compactness and mitigate pseudo-noise induced by similar novel attack methods. The CAP module imposes strong constraints and adaptively filters samples with high uncertainty to improve the accuracy of the pseudo-labeling. In addition, we extend the MPSL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments and visualizations verify the superiority of our proposed method on the OW-DFA++ and demonstrate the interpretability of the deepfake attribution task and its impact on improving the security of the deepfake detection area.

Abstract Image

查看原文本刊更多论文

利用多视角感官学习反思开放世界深度假货归属问题

由于生成技术的快速发展，伪造人脸的来源归属问题受到了广泛关注。尽管最近的许多研究已经在 GAN 生成人脸方面迈出了重要的一步，但与身份互换或扩散模型相关的更具威胁性的攻击仍被忽视。而隐藏在开放世界无标记人脸未知攻击中的伪造痕迹仍未得到充分探索。为了推动相关前沿研究，我们引入了一项名为 "开放世界深度伪造归因"（Open-World DeepFake Attribution）的新任务和相应的基准 OW-DFA++，旨在评估开放世界场景中针对各种类型伪造人脸的归因性能。同时，我们提出了一个多视角感官学习（MPSL）框架，旨在应对OW-DFA++的挑战。由于不同的伪造人脸有不同的篡改区域和频率伪影，我们引入了多视角感知投票（MPV）模块，该模块基于全局、多尺度局部和频率关系对样本间特征进行排列。MPV 模块可有效过滤并归类属于同一攻击类型的样本。伪标记是半监督学习任务中另一种常见而有效的策略，我们提出了置信度自适应伪标记（CAP）模块，利用软伪标记来增强类的紧凑性，并减轻类似新型攻击方法所引起的伪噪声。CAP 模块强加约束，并自适应地过滤不确定性较高的样本，以提高伪标记的准确性。此外，我们还利用预训练技术和迭代学习的多阶段范式扩展了 MPSL 框架，以进一步提高可追溯性。广泛的实验和可视化验证了我们提出的方法在 OW-DFA++ 上的优越性，并证明了深度赝品归因任务的可解释性及其对提高深度赝品检测领域安全性的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.