{"title":"Prompt-Based Cross-Modal Feature Alignment for Weakly Supervised IFER","authors":"Hanqin Shi;Xiaofeng Kang;Jiaxiang Wang;Aihua Zheng;Wenjuan Cheng","doi":"10.1109/LSP.2025.3601513","DOIUrl":null,"url":null,"abstract":"Infrared Facial Expression Recognition (IFER) encounters challenges in data acquisition and annotation under low-light conditions, making fully supervised training difficult. Although pre-trained Vision-Language Models (VLMs) can enhance generalization for downstream tasks, their insufficient attention modeling in cross-domain scenarios leads to ineffective local semantic correlation. To address this, we propose a Prompt-based Cross-modal feature Alignment (PCA) method that improves weakly supervised IFER performance by leveraging RGB facial expression data. The PCA framework comprises two key components: (1) a Cross-modal Prompt Transfer (CPT) strategy that integrates category-specific information to distinguish expressions, and (2) an Image-Guided Alignment (IGA) module that achieves feature alignment using dual-domain feature banks. Experimental results on two benchmark datasets demonstrate that our method significantly outperforms current state-of-the-art approaches, confirming its effectiveness and superiority.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3410-3414"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11131686/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Infrared Facial Expression Recognition (IFER) encounters challenges in data acquisition and annotation under low-light conditions, making fully supervised training difficult. Although pre-trained Vision-Language Models (VLMs) can enhance generalization for downstream tasks, their insufficient attention modeling in cross-domain scenarios leads to ineffective local semantic correlation. To address this, we propose a Prompt-based Cross-modal feature Alignment (PCA) method that improves weakly supervised IFER performance by leveraging RGB facial expression data. The PCA framework comprises two key components: (1) a Cross-modal Prompt Transfer (CPT) strategy that integrates category-specific information to distinguish expressions, and (2) an Image-Guided Alignment (IGA) module that achieves feature alignment using dual-domain feature banks. Experimental results on two benchmark datasets demonstrate that our method significantly outperforms current state-of-the-art approaches, confirming its effectiveness and superiority.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.