基于自适应掩码引导监督网络的去偏见面部表情识别

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-06-28 DOI:10.1016/j.patcog.2025.112023

Tianlong Gu, Hao Li, Xuan Feng, Yiqin Luo

{"title":"基于自适应掩码引导监督网络的去偏见面部表情识别","authors":"Tianlong Gu, Hao Li, Xuan Feng, Yiqin Luo","doi":"10.1016/j.patcog.2025.112023","DOIUrl":null,"url":null,"abstract":"<div><div>Facial expression recognition plays a crucial role in understanding human emotions and behavior. However, existing models often exhibit biases and imbalance towards diverse expression classes. To address this problem, we propose an Adaptive Mask-Guide Supervised Network (AMGSN) to enhance the uniform performance of the facial expression recognition models. We propose an adaptive mask guidance mechanism to mitigate bias and ensure uniform performance across different expression classes. AMGSN focuses on learning the ability to distinguish facial features with under-expressed expressions by dynamically generating masks during pre-training. Specifically, we employ an asymmetric encoder–decoder architecture, where the encoder encodes only the unmasked visible regions, while the lightweight decoder reconstructs the original image using latent representations and mask markers. By utilizing dynamically generated masks and focusing on informative regions, these models effectively reduce the interference of confounding factors, thus enhancing the discriminative power of the learned representation. In the pre-training stage, we introduce the Attention-Based Mask Generator (ABMG) to identify salient regions of expressions. Additionally, we advance the Mask Ratio Update Strategy (MRUS), which utilizes image reconstruction loss, to adjust the mask ratio for each image during pre-training. In the finetune stage, debiased center loss and contrastive loss are introduced to optimize the network to ensure the overall performance of expression recognition. Extensive experimental results on several standard datasets demonstrate that the proposed AMGSN significantly improves both balance and accuracy compared to state-of-the-art methods. For example, AMGSN reached 89.34% on RAF-DB, and 62.83% on AffectNet, respectively, with a standard deviation of only 0.0746 and 0.0484. This demonstrates the effectiveness of our improvements<span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112023"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AMGSN: Adaptive mask-guide supervised network for debiased facial expression recognition\",\"authors\":\"Tianlong Gu, Hao Li, Xuan Feng, Yiqin Luo\",\"doi\":\"10.1016/j.patcog.2025.112023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Facial expression recognition plays a crucial role in understanding human emotions and behavior. However, existing models often exhibit biases and imbalance towards diverse expression classes. To address this problem, we propose an Adaptive Mask-Guide Supervised Network (AMGSN) to enhance the uniform performance of the facial expression recognition models. We propose an adaptive mask guidance mechanism to mitigate bias and ensure uniform performance across different expression classes. AMGSN focuses on learning the ability to distinguish facial features with under-expressed expressions by dynamically generating masks during pre-training. Specifically, we employ an asymmetric encoder–decoder architecture, where the encoder encodes only the unmasked visible regions, while the lightweight decoder reconstructs the original image using latent representations and mask markers. By utilizing dynamically generated masks and focusing on informative regions, these models effectively reduce the interference of confounding factors, thus enhancing the discriminative power of the learned representation. In the pre-training stage, we introduce the Attention-Based Mask Generator (ABMG) to identify salient regions of expressions. Additionally, we advance the Mask Ratio Update Strategy (MRUS), which utilizes image reconstruction loss, to adjust the mask ratio for each image during pre-training. In the finetune stage, debiased center loss and contrastive loss are introduced to optimize the network to ensure the overall performance of expression recognition. Extensive experimental results on several standard datasets demonstrate that the proposed AMGSN significantly improves both balance and accuracy compared to state-of-the-art methods. For example, AMGSN reached 89.34% on RAF-DB, and 62.83% on AffectNet, respectively, with a standard deviation of only 0.0746 and 0.0484. This demonstrates the effectiveness of our improvements<span><span><sup>1</sup></span></span>.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"170 \",\"pages\":\"Article 112023\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325006831\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325006831","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

面部表情识别在理解人类情绪和行为方面起着至关重要的作用。然而，现有的模型往往表现出对不同表达类别的偏见和不平衡。为了解决这个问题，我们提出了一种自适应掩码引导监督网络（AMGSN）来增强面部表情识别模型的一致性。我们提出了一种自适应掩码引导机制，以减轻偏见，并确保在不同的表达类中表现一致。AMGSN的重点是通过在预训练过程中动态生成面具来学习区分面部特征和表达不足的能力。具体来说，我们采用了非对称编码器-解码器架构，其中编码器仅编码未被掩盖的可见区域，而轻量级解码器使用潜在表示和掩码标记重建原始图像。这些模型利用动态生成的掩模，关注信息区域，有效地减少了混杂因素的干扰，从而增强了学习表征的判别能力。在预训练阶段，我们引入了基于注意力的面具生成器（ABMG）来识别表情的显著区域。此外，我们提出了Mask Ratio Update Strategy (MRUS)，利用图像重建损失在预训练期间调整每个图像的Mask Ratio。在微调阶段，引入去偏中心损失和对比损失对网络进行优化，保证表情识别的整体性能。在多个标准数据集上的大量实验结果表明，与目前的方法相比，所提出的AMGSN在平衡和精度方面都有显著提高。例如，AMGSN在RAF-DB上达到89.34%，在AffectNet上达到62.83%，标准差仅为0.0746和0.0484。这证明了我们改进的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AMGSN: Adaptive mask-guide supervised network for debiased facial expression recognition

Facial expression recognition plays a crucial role in understanding human emotions and behavior. However, existing models often exhibit biases and imbalance towards diverse expression classes. To address this problem, we propose an Adaptive Mask-Guide Supervised Network (AMGSN) to enhance the uniform performance of the facial expression recognition models. We propose an adaptive mask guidance mechanism to mitigate bias and ensure uniform performance across different expression classes. AMGSN focuses on learning the ability to distinguish facial features with under-expressed expressions by dynamically generating masks during pre-training. Specifically, we employ an asymmetric encoder–decoder architecture, where the encoder encodes only the unmasked visible regions, while the lightweight decoder reconstructs the original image using latent representations and mask markers. By utilizing dynamically generated masks and focusing on informative regions, these models effectively reduce the interference of confounding factors, thus enhancing the discriminative power of the learned representation. In the pre-training stage, we introduce the Attention-Based Mask Generator (ABMG) to identify salient regions of expressions. Additionally, we advance the Mask Ratio Update Strategy (MRUS), which utilizes image reconstruction loss, to adjust the mask ratio for each image during pre-training. In the finetune stage, debiased center loss and contrastive loss are introduced to optimize the network to ensure the overall performance of expression recognition. Extensive experimental results on several standard datasets demonstrate that the proposed AMGSN significantly improves both balance and accuracy compared to state-of-the-art methods. For example, AMGSN reached 89.34% on RAF-DB, and 62.83% on AffectNet, respectively, with a standard deviation of only 0.0746 and 0.0484. This demonstrates the effectiveness of our improvements¹.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.