A noise-robust and generalizable framework for facial expression recognition

IF 6.8 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-06-26 DOI:10.1016/j.ins.2025.122457

Jinglin Zhang , Qiangchang Wang , Jing Li , Yilong Yin

{"title":"A noise-robust and generalizable framework for facial expression recognition","authors":"Jinglin Zhang , Qiangchang Wang , Jing Li , Yilong Yin","doi":"10.1016/j.ins.2025.122457","DOIUrl":null,"url":null,"abstract":"<div><div>Facial Expression Recognition (FER) shows promising applicability in various real-world contexts, including criminal investigations and digital entertainment. Existing cross-domain FER methods primarily focus on spatial domain features sensitive to noise. However, these methods may propagate noise from the source domain to unseen target domains, degrading recognition performance. To address this, we propose a Noise-Robust and Generalizable framework for FER (NR-GFER), mainly comprising Residual Adapter (RA), Fourier Prompt (FP) modules, and a cross-stage unified fusion mechanism. Specifically, the RA module flexibly transfers the generalization ability of a visual-language large model to FER. Leveraging the residual mechanism improves the discriminative ability of spatial domain features. However, the domain gap may lead FER models to capture source domain-specific noise, which adversely affects performance on target domains. To mitigate this, the FP module extracts frequency domain features via the Fourier transform, integrates them with prompts, and reconstructs them back to the spatial domain through the inverse Fourier transform, thus reducing the negative impact of noise from the source domain. Finally, the cross-stage unified fusion mechanism that bridges intra-module and inter-module semantic priorities, simplifying hyperparameter optimization. Comprehensive evaluations across seven in-the-wild FER datasets confirm that our NR-GFER achieves state-of-the-art performance.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122457"},"PeriodicalIF":6.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525005894","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Facial Expression Recognition (FER) shows promising applicability in various real-world contexts, including criminal investigations and digital entertainment. Existing cross-domain FER methods primarily focus on spatial domain features sensitive to noise. However, these methods may propagate noise from the source domain to unseen target domains, degrading recognition performance. To address this, we propose a Noise-Robust and Generalizable framework for FER (NR-GFER), mainly comprising Residual Adapter (RA), Fourier Prompt (FP) modules, and a cross-stage unified fusion mechanism. Specifically, the RA module flexibly transfers the generalization ability of a visual-language large model to FER. Leveraging the residual mechanism improves the discriminative ability of spatial domain features. However, the domain gap may lead FER models to capture source domain-specific noise, which adversely affects performance on target domains. To mitigate this, the FP module extracts frequency domain features via the Fourier transform, integrates them with prompts, and reconstructs them back to the spatial domain through the inverse Fourier transform, thus reducing the negative impact of noise from the source domain. Finally, the cross-stage unified fusion mechanism that bridges intra-module and inter-module semantic priorities, simplifying hyperparameter optimization. Comprehensive evaluations across seven in-the-wild FER datasets confirm that our NR-GFER achieves state-of-the-art performance.

查看原文本刊更多论文

基于噪声鲁棒性的面部表情识别框架

面部表情识别（FER）在各种现实环境中显示出良好的适用性，包括刑事调查和数字娱乐。现有的跨域FER方法主要关注对噪声敏感的空间域特征。然而，这些方法可能会将噪声从源域传播到未知的目标域，从而降低识别性能。为了解决这个问题，我们提出了一个噪声鲁棒和可推广的FER框架（NR-GFER），主要包括残差适配器（RA），傅立叶提示（FP）模块和跨阶段统一融合机制。具体而言，RA模块将视觉语言大模型的泛化能力灵活地转移到FER中。利用残差机制提高了空间域特征的判别能力。然而，域间隙可能导致FER模型捕获源特定于域的噪声，这对目标域的性能产生不利影响。为了缓解这一问题，FP模块通过傅里叶变换提取频域特征，将其与提示符集成，并通过傅里叶反变换将其重建回空间域，从而减少来自源域的噪声的负面影响。最后，跨阶段统一融合机制，桥接模块内和模块间语义优先级，简化超参数优化。对七个野外FER数据集的综合评估证实，我们的NR-GFER达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.