基于视觉变换的伪造感知自适应学习广义人脸伪造检测

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-24 DOI:10.1109/TCSVT.2024.3522091

Anwei Luo;Rizhao Cai;Chenqi Kong;Yakun Ju;Xiangui Kang;Jiwu Huang;Alex C. Kot

{"title":"基于视觉变换的伪造感知自适应学习广义人脸伪造检测","authors":"Anwei Luo;Rizhao Cai;Chenqi Kong;Yakun Ju;Xiangui Kang;Jiwu Huang;Alex C. Kot","doi":"10.1109/TCSVT.2024.3522091","DOIUrl":null,"url":null,"abstract":"With the rapid progress of generative models, the current challenge in face forgery detection is how to effectively detect realistic manipulated faces from different unseen domains. Though previous studies show that pre-trained Vision Transformer (ViT) based models can achieve some promising results after fully fine-tuning on the Deepfake dataset, their generalization performances are still unsatisfactory. To this end, we present a Forgery-aware Adaptive Vision Transformer (FA-ViT) under the adaptive learning paradigm for generalized face forgery detection, where the parameters in the pre-trained ViT are kept fixed while the designed adaptive modules are optimized to capture forgery features. Specifically, a global adaptive module is designed to model long-range interactions among input tokens, which takes advantage of self-attention mechanism to mine global forgery clues. To further explore essential local forgery clues, a local adaptive module is proposed to expose local inconsistencies by enhancing the local contextual association. In addition, we introduce a fine-grained adaptive learning module that emphasizes the common compact representation of genuine faces through relationship learning in fine-grained pairs, driving these proposed adaptive modules to be aware of fine-grained forgery-aware information. Extensive experiments demonstrate that our FA-ViT achieves state-of-the-arts results in the cross-dataset evaluation, and enhances the robustness against unseen perturbations. Particularly, FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation. The code and trained model have been released at: <uri>https://github.com/LoveSiameseCat/FAViT</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4116-4129"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Forgery-Aware Adaptive Learning With Vision Transformer for Generalized Face Forgery Detection\",\"authors\":\"Anwei Luo;Rizhao Cai;Chenqi Kong;Yakun Ju;Xiangui Kang;Jiwu Huang;Alex C. Kot\",\"doi\":\"10.1109/TCSVT.2024.3522091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid progress of generative models, the current challenge in face forgery detection is how to effectively detect realistic manipulated faces from different unseen domains. Though previous studies show that pre-trained Vision Transformer (ViT) based models can achieve some promising results after fully fine-tuning on the Deepfake dataset, their generalization performances are still unsatisfactory. To this end, we present a Forgery-aware Adaptive Vision Transformer (FA-ViT) under the adaptive learning paradigm for generalized face forgery detection, where the parameters in the pre-trained ViT are kept fixed while the designed adaptive modules are optimized to capture forgery features. Specifically, a global adaptive module is designed to model long-range interactions among input tokens, which takes advantage of self-attention mechanism to mine global forgery clues. To further explore essential local forgery clues, a local adaptive module is proposed to expose local inconsistencies by enhancing the local contextual association. In addition, we introduce a fine-grained adaptive learning module that emphasizes the common compact representation of genuine faces through relationship learning in fine-grained pairs, driving these proposed adaptive modules to be aware of fine-grained forgery-aware information. Extensive experiments demonstrate that our FA-ViT achieves state-of-the-arts results in the cross-dataset evaluation, and enhances the robustness against unseen perturbations. Particularly, FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation. The code and trained model have been released at: <uri>https://github.com/LoveSiameseCat/FAViT</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4116-4129\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10813581/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10813581/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

随着生成模型的快速发展，人脸伪造检测面临的挑战是如何有效地从不同的不可见域检测出真实的被操纵人脸。虽然之前的研究表明，基于预训练视觉变压器（ViT）的模型在Deepfake数据集上进行充分微调后可以取得一些有希望的结果，但其泛化性能仍然令人不满意。为此，我们在自适应学习范式下提出了一种用于广义人脸伪造检测的伪造感知自适应视觉转换器（FA-ViT），其中预训练ViT中的参数保持固定，而设计的自适应模块则被优化以捕获伪造特征。具体来说，设计了一个全局自适应模块来模拟输入令牌之间的远程交互，利用自关注机制挖掘全局伪造线索。为了进一步挖掘重要的局部伪造线索，提出了一个局部自适应模块，通过增强局部上下文关联来暴露局部不一致。此外，我们还引入了一个细粒度的自适应学习模块，该模块通过细粒度对中的关系学习来强调真实面孔的共同紧凑表示，从而驱动这些提出的自适应模块意识到细粒度的伪造感知信息。大量的实验表明，我们的FA-ViT在跨数据集评估中达到了最先进的结果，并增强了对未知扰动的鲁棒性。特别是FA-ViT在Celeb-DF和DFDC数据集上的AUC得分分别达到了93.83%和78.32%。代码和训练过的模型已经发布在：https://github.com/LoveSiameseCat/FAViT。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Forgery-Aware Adaptive Learning With Vision Transformer for Generalized Face Forgery Detection

With the rapid progress of generative models, the current challenge in face forgery detection is how to effectively detect realistic manipulated faces from different unseen domains. Though previous studies show that pre-trained Vision Transformer (ViT) based models can achieve some promising results after fully fine-tuning on the Deepfake dataset, their generalization performances are still unsatisfactory. To this end, we present a Forgery-aware Adaptive Vision Transformer (FA-ViT) under the adaptive learning paradigm for generalized face forgery detection, where the parameters in the pre-trained ViT are kept fixed while the designed adaptive modules are optimized to capture forgery features. Specifically, a global adaptive module is designed to model long-range interactions among input tokens, which takes advantage of self-attention mechanism to mine global forgery clues. To further explore essential local forgery clues, a local adaptive module is proposed to expose local inconsistencies by enhancing the local contextual association. In addition, we introduce a fine-grained adaptive learning module that emphasizes the common compact representation of genuine faces through relationship learning in fine-grained pairs, driving these proposed adaptive modules to be aware of fine-grained forgery-aware information. Extensive experiments demonstrate that our FA-ViT achieves state-of-the-arts results in the cross-dataset evaluation, and enhances the robustness against unseen perturbations. Particularly, FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation. The code and trained model have been released at: https://github.com/LoveSiameseCat/FAViT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.