{"title":"AMGSN: Adaptive mask-guide supervised network for debiased facial expression recognition","authors":"Tianlong Gu, Hao Li, Xuan Feng, Yiqin Luo","doi":"10.1016/j.patcog.2025.112023","DOIUrl":null,"url":null,"abstract":"<div><div>Facial expression recognition plays a crucial role in understanding human emotions and behavior. However, existing models often exhibit biases and imbalance towards diverse expression classes. To address this problem, we propose an Adaptive Mask-Guide Supervised Network (AMGSN) to enhance the uniform performance of the facial expression recognition models. We propose an adaptive mask guidance mechanism to mitigate bias and ensure uniform performance across different expression classes. AMGSN focuses on learning the ability to distinguish facial features with under-expressed expressions by dynamically generating masks during pre-training. Specifically, we employ an asymmetric encoder–decoder architecture, where the encoder encodes only the unmasked visible regions, while the lightweight decoder reconstructs the original image using latent representations and mask markers. By utilizing dynamically generated masks and focusing on informative regions, these models effectively reduce the interference of confounding factors, thus enhancing the discriminative power of the learned representation. In the pre-training stage, we introduce the Attention-Based Mask Generator (ABMG) to identify salient regions of expressions. Additionally, we advance the Mask Ratio Update Strategy (MRUS), which utilizes image reconstruction loss, to adjust the mask ratio for each image during pre-training. In the finetune stage, debiased center loss and contrastive loss are introduced to optimize the network to ensure the overall performance of expression recognition. Extensive experimental results on several standard datasets demonstrate that the proposed AMGSN significantly improves both balance and accuracy compared to state-of-the-art methods. For example, AMGSN reached 89.34% on RAF-DB, and 62.83% on AffectNet, respectively, with a standard deviation of only 0.0746 and 0.0484. This demonstrates the effectiveness of our improvements<span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112023"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325006831","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Facial expression recognition plays a crucial role in understanding human emotions and behavior. However, existing models often exhibit biases and imbalance towards diverse expression classes. To address this problem, we propose an Adaptive Mask-Guide Supervised Network (AMGSN) to enhance the uniform performance of the facial expression recognition models. We propose an adaptive mask guidance mechanism to mitigate bias and ensure uniform performance across different expression classes. AMGSN focuses on learning the ability to distinguish facial features with under-expressed expressions by dynamically generating masks during pre-training. Specifically, we employ an asymmetric encoder–decoder architecture, where the encoder encodes only the unmasked visible regions, while the lightweight decoder reconstructs the original image using latent representations and mask markers. By utilizing dynamically generated masks and focusing on informative regions, these models effectively reduce the interference of confounding factors, thus enhancing the discriminative power of the learned representation. In the pre-training stage, we introduce the Attention-Based Mask Generator (ABMG) to identify salient regions of expressions. Additionally, we advance the Mask Ratio Update Strategy (MRUS), which utilizes image reconstruction loss, to adjust the mask ratio for each image during pre-training. In the finetune stage, debiased center loss and contrastive loss are introduced to optimize the network to ensure the overall performance of expression recognition. Extensive experimental results on several standard datasets demonstrate that the proposed AMGSN significantly improves both balance and accuracy compared to state-of-the-art methods. For example, AMGSN reached 89.34% on RAF-DB, and 62.83% on AffectNet, respectively, with a standard deviation of only 0.0746 and 0.0484. This demonstrates the effectiveness of our improvements1.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.