A noise-robust and generalizable framework for facial expression recognition

IF 6.8 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Jinglin Zhang , Qiangchang Wang , Jing Li , Yilong Yin
{"title":"A noise-robust and generalizable framework for facial expression recognition","authors":"Jinglin Zhang ,&nbsp;Qiangchang Wang ,&nbsp;Jing Li ,&nbsp;Yilong Yin","doi":"10.1016/j.ins.2025.122457","DOIUrl":null,"url":null,"abstract":"<div><div>Facial Expression Recognition (FER) shows promising applicability in various real-world contexts, including criminal investigations and digital entertainment. Existing cross-domain FER methods primarily focus on spatial domain features sensitive to noise. However, these methods may propagate noise from the source domain to unseen target domains, degrading recognition performance. To address this, we propose a Noise-Robust and Generalizable framework for FER (NR-GFER), mainly comprising Residual Adapter (RA), Fourier Prompt (FP) modules, and a cross-stage unified fusion mechanism. Specifically, the RA module flexibly transfers the generalization ability of a visual-language large model to FER. Leveraging the residual mechanism improves the discriminative ability of spatial domain features. However, the domain gap may lead FER models to capture source domain-specific noise, which adversely affects performance on target domains. To mitigate this, the FP module extracts frequency domain features via the Fourier transform, integrates them with prompts, and reconstructs them back to the spatial domain through the inverse Fourier transform, thus reducing the negative impact of noise from the source domain. Finally, the cross-stage unified fusion mechanism that bridges intra-module and inter-module semantic priorities, simplifying hyperparameter optimization. Comprehensive evaluations across seven in-the-wild FER datasets confirm that our NR-GFER achieves state-of-the-art performance.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122457"},"PeriodicalIF":6.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525005894","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Facial Expression Recognition (FER) shows promising applicability in various real-world contexts, including criminal investigations and digital entertainment. Existing cross-domain FER methods primarily focus on spatial domain features sensitive to noise. However, these methods may propagate noise from the source domain to unseen target domains, degrading recognition performance. To address this, we propose a Noise-Robust and Generalizable framework for FER (NR-GFER), mainly comprising Residual Adapter (RA), Fourier Prompt (FP) modules, and a cross-stage unified fusion mechanism. Specifically, the RA module flexibly transfers the generalization ability of a visual-language large model to FER. Leveraging the residual mechanism improves the discriminative ability of spatial domain features. However, the domain gap may lead FER models to capture source domain-specific noise, which adversely affects performance on target domains. To mitigate this, the FP module extracts frequency domain features via the Fourier transform, integrates them with prompts, and reconstructs them back to the spatial domain through the inverse Fourier transform, thus reducing the negative impact of noise from the source domain. Finally, the cross-stage unified fusion mechanism that bridges intra-module and inter-module semantic priorities, simplifying hyperparameter optimization. Comprehensive evaluations across seven in-the-wild FER datasets confirm that our NR-GFER achieves state-of-the-art performance.
基于噪声鲁棒性的面部表情识别框架
面部表情识别(FER)在各种现实环境中显示出良好的适用性,包括刑事调查和数字娱乐。现有的跨域FER方法主要关注对噪声敏感的空间域特征。然而,这些方法可能会将噪声从源域传播到未知的目标域,从而降低识别性能。为了解决这个问题,我们提出了一个噪声鲁棒和可推广的FER框架(NR-GFER),主要包括残差适配器(RA),傅立叶提示(FP)模块和跨阶段统一融合机制。具体而言,RA模块将视觉语言大模型的泛化能力灵活地转移到FER中。利用残差机制提高了空间域特征的判别能力。然而,域间隙可能导致FER模型捕获源特定于域的噪声,这对目标域的性能产生不利影响。为了缓解这一问题,FP模块通过傅里叶变换提取频域特征,将其与提示符集成,并通过傅里叶反变换将其重建回空间域,从而减少来自源域的噪声的负面影响。最后,跨阶段统一融合机制,桥接模块内和模块间语义优先级,简化超参数优化。对七个野外FER数据集的综合评估证实,我们的NR-GFER达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信