跨域人脸防欺骗的多模态文本增强

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Lvpan Cai;Haowei Wang;Jiayi Ji;Xiaoshuai Sun;Liujuan Cao;Rongrong Ji
{"title":"跨域人脸防欺骗的多模态文本增强","authors":"Lvpan Cai;Haowei Wang;Jiayi Ji;Xiaoshuai Sun;Liujuan Cao;Rongrong Ji","doi":"10.1109/TIFS.2025.3571660","DOIUrl":null,"url":null,"abstract":"The focus of Face Anti-Spoofing (FAS) is shifting toward improving generalization performance in unseen scenarios. Traditional methods employing adversarial learning and meta-learning aim to extract or decouple generalizable features to address these challenges. However, enhancing performance solely through facial features remains challenging without additional informative inputs. To address this, Vision-Language Models (VLMs) with robust generalization capabilities have recently been introduced to FAS. Despite their potential, these VLMs typically adopt a late alignment strategy, relying only on encoder output features for modality alignment, which largely neglects mutual guidance between modalities. To bridge this gap, inspired by recent advancements in prompt learning, we employ learnable prompts and masking as intermediaries to enhance interaction between text and visual modalities, enabling the extraction of more generalizable features. Specifically, we propose ME-FAS, a Modality-Enhanced cross-domain FAS model integrating Prompt Fusion Transfer (PFT) and Text-guided Image Masking (TIM). PFT facilitates the integration of text features with visual information, improving domain adaptability in alignment with the textual context. Meanwhile, TIM leverages text features to mask image patches, directing visual features toward critical generalizable facial information, such as the eyes and mouth. Comprehensive evaluations across multiple benchmarks and various visualizations demonstrate significant performance gains, validating the effectiveness of our proposed approach. Our code and models are available at <uri>https://github.com/clpbc/ME-FAS</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"5451-5464"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ME-FAS: Multimodal Text Enhancement for Cross-Domain Face Anti-Spoofing\",\"authors\":\"Lvpan Cai;Haowei Wang;Jiayi Ji;Xiaoshuai Sun;Liujuan Cao;Rongrong Ji\",\"doi\":\"10.1109/TIFS.2025.3571660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The focus of Face Anti-Spoofing (FAS) is shifting toward improving generalization performance in unseen scenarios. Traditional methods employing adversarial learning and meta-learning aim to extract or decouple generalizable features to address these challenges. However, enhancing performance solely through facial features remains challenging without additional informative inputs. To address this, Vision-Language Models (VLMs) with robust generalization capabilities have recently been introduced to FAS. Despite their potential, these VLMs typically adopt a late alignment strategy, relying only on encoder output features for modality alignment, which largely neglects mutual guidance between modalities. To bridge this gap, inspired by recent advancements in prompt learning, we employ learnable prompts and masking as intermediaries to enhance interaction between text and visual modalities, enabling the extraction of more generalizable features. Specifically, we propose ME-FAS, a Modality-Enhanced cross-domain FAS model integrating Prompt Fusion Transfer (PFT) and Text-guided Image Masking (TIM). PFT facilitates the integration of text features with visual information, improving domain adaptability in alignment with the textual context. Meanwhile, TIM leverages text features to mask image patches, directing visual features toward critical generalizable facial information, such as the eyes and mouth. Comprehensive evaluations across multiple benchmarks and various visualizations demonstrate significant performance gains, validating the effectiveness of our proposed approach. Our code and models are available at <uri>https://github.com/clpbc/ME-FAS</uri>\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"5451-5464\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11007113/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11007113/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

人脸反欺骗(FAS)的研究重点正转向提高未知场景下的泛化性能。采用对抗学习和元学习的传统方法旨在提取或解耦可推广的特征来解决这些挑战。然而,如果没有额外的信息输入,仅通过面部特征来提高性能仍然具有挑战性。为了解决这个问题,具有强大泛化能力的视觉语言模型(VLMs)最近被引入到FAS中。尽管具有潜力,但这些vlm通常采用后期对齐策略,仅依靠编码器输出特征进行模态对齐,这在很大程度上忽略了模态之间的相互引导。为了弥补这一差距,受提示学习的最新进展的启发,我们采用可学习的提示和掩蔽作为中介来增强文本和视觉模式之间的交互,从而能够提取更多可概括的特征。具体来说,我们提出了ME-FAS,这是一种模态增强的跨域FAS模型,集成了提示融合传输(PFT)和文本引导图像掩蔽(TIM)。PFT促进了文本特征与视觉信息的集成,提高了与文本上下文对齐的领域适应性。同时,TIM利用文本特征来掩盖图像补丁,将视觉特征指向关键的可概括的面部信息,如眼睛和嘴巴。跨多个基准测试和各种可视化的综合评估显示了显著的性能提升,验证了我们提出的方法的有效性。我们的代码和模型可在https://github.com/clpbc/ME-FAS上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ME-FAS: Multimodal Text Enhancement for Cross-Domain Face Anti-Spoofing
The focus of Face Anti-Spoofing (FAS) is shifting toward improving generalization performance in unseen scenarios. Traditional methods employing adversarial learning and meta-learning aim to extract or decouple generalizable features to address these challenges. However, enhancing performance solely through facial features remains challenging without additional informative inputs. To address this, Vision-Language Models (VLMs) with robust generalization capabilities have recently been introduced to FAS. Despite their potential, these VLMs typically adopt a late alignment strategy, relying only on encoder output features for modality alignment, which largely neglects mutual guidance between modalities. To bridge this gap, inspired by recent advancements in prompt learning, we employ learnable prompts and masking as intermediaries to enhance interaction between text and visual modalities, enabling the extraction of more generalizable features. Specifically, we propose ME-FAS, a Modality-Enhanced cross-domain FAS model integrating Prompt Fusion Transfer (PFT) and Text-guided Image Masking (TIM). PFT facilitates the integration of text features with visual information, improving domain adaptability in alignment with the textual context. Meanwhile, TIM leverages text features to mask image patches, directing visual features toward critical generalizable facial information, such as the eyes and mouth. Comprehensive evaluations across multiple benchmarks and various visualizations demonstrate significant performance gains, validating the effectiveness of our proposed approach. Our code and models are available at https://github.com/clpbc/ME-FAS
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Information Forensics and Security
IEEE Transactions on Information Forensics and Security 工程技术-工程:电子与电气
CiteScore
14.40
自引率
7.40%
发文量
234
审稿时长
6.5 months
期刊介绍: The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信