{"title":"ferf - vamba:一个具有全局紧凑关注和层次特征交互的鲁棒面部表情识别框架","authors":"Hui Ma , Sen Lei , Heng-Chao Li , Turgay Celik","doi":"10.1016/j.inffus.2025.103371","DOIUrl":null,"url":null,"abstract":"<div><div>Facial Expression Recognition (FER) has broad applications in driver safety, human–computer interaction, and cognitive psychology research, where it helps analyze emotional states and enhance social interactions. However, FER in static images faces challenges due to occlusions and pose variations, which hinder the model’s effectiveness in real-world scenarios. To address these issues, we propose FER-VMamba, a robust and efficient architecture designed to improve FER performance in complex scenarios. FER-VMamba comprises two core modules: the Global Compact Attention Module (GCAM) and the Hierarchical Feature Interaction Module (HFIM). GCAM extracts compact global semantic features through Multi-Scale Hybrid Convolutions (MixConv), refining them with a Spatial Channel Attention Mechanism (SCAM) to improve robustness against occlusions and pose variations. HFIM captures local and global dependencies by segmenting feature maps into non-overlapping partitions, which the FER-VSS module processes with Conv-SCAM-Conv for local features and Visual State-Space (VSS) for global dependencies. Additionally, self-attention and relation-attention mechanisms in HFIM refine features by modeling inter-partition relationships, further improving the accuracy of expression recognition. Extensive experiments on the RAF and AffectNet datasets demonstrate that FER-VMamba achieves state-of-the-art performance. Furthermore, we introduce FSL-FER-VMamba, an extension of FER-VSS optimized for cross-domain few-shot FER, providing strong adaptability to domain shifts. <span><span>https://github.com/SwjtuMa/FER-VMamba.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103371"},"PeriodicalIF":14.7000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FER-VMamba: A robust facial expression recognition framework with global compact attention and hierarchical feature interaction\",\"authors\":\"Hui Ma , Sen Lei , Heng-Chao Li , Turgay Celik\",\"doi\":\"10.1016/j.inffus.2025.103371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Facial Expression Recognition (FER) has broad applications in driver safety, human–computer interaction, and cognitive psychology research, where it helps analyze emotional states and enhance social interactions. However, FER in static images faces challenges due to occlusions and pose variations, which hinder the model’s effectiveness in real-world scenarios. To address these issues, we propose FER-VMamba, a robust and efficient architecture designed to improve FER performance in complex scenarios. FER-VMamba comprises two core modules: the Global Compact Attention Module (GCAM) and the Hierarchical Feature Interaction Module (HFIM). GCAM extracts compact global semantic features through Multi-Scale Hybrid Convolutions (MixConv), refining them with a Spatial Channel Attention Mechanism (SCAM) to improve robustness against occlusions and pose variations. HFIM captures local and global dependencies by segmenting feature maps into non-overlapping partitions, which the FER-VSS module processes with Conv-SCAM-Conv for local features and Visual State-Space (VSS) for global dependencies. Additionally, self-attention and relation-attention mechanisms in HFIM refine features by modeling inter-partition relationships, further improving the accuracy of expression recognition. Extensive experiments on the RAF and AffectNet datasets demonstrate that FER-VMamba achieves state-of-the-art performance. Furthermore, we introduce FSL-FER-VMamba, an extension of FER-VSS optimized for cross-domain few-shot FER, providing strong adaptability to domain shifts. <span><span>https://github.com/SwjtuMa/FER-VMamba.git</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"124 \",\"pages\":\"Article 103371\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525004440\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004440","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
FER-VMamba: A robust facial expression recognition framework with global compact attention and hierarchical feature interaction
Facial Expression Recognition (FER) has broad applications in driver safety, human–computer interaction, and cognitive psychology research, where it helps analyze emotional states and enhance social interactions. However, FER in static images faces challenges due to occlusions and pose variations, which hinder the model’s effectiveness in real-world scenarios. To address these issues, we propose FER-VMamba, a robust and efficient architecture designed to improve FER performance in complex scenarios. FER-VMamba comprises two core modules: the Global Compact Attention Module (GCAM) and the Hierarchical Feature Interaction Module (HFIM). GCAM extracts compact global semantic features through Multi-Scale Hybrid Convolutions (MixConv), refining them with a Spatial Channel Attention Mechanism (SCAM) to improve robustness against occlusions and pose variations. HFIM captures local and global dependencies by segmenting feature maps into non-overlapping partitions, which the FER-VSS module processes with Conv-SCAM-Conv for local features and Visual State-Space (VSS) for global dependencies. Additionally, self-attention and relation-attention mechanisms in HFIM refine features by modeling inter-partition relationships, further improving the accuracy of expression recognition. Extensive experiments on the RAF and AffectNet datasets demonstrate that FER-VMamba achieves state-of-the-art performance. Furthermore, we introduce FSL-FER-VMamba, an extension of FER-VSS optimized for cross-domain few-shot FER, providing strong adaptability to domain shifts. https://github.com/SwjtuMa/FER-VMamba.git.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.