基于自适应掩码引导监督网络的去偏见面部表情识别

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Tianlong Gu, Hao Li, Xuan Feng, Yiqin Luo
{"title":"基于自适应掩码引导监督网络的去偏见面部表情识别","authors":"Tianlong Gu,&nbsp;Hao Li,&nbsp;Xuan Feng,&nbsp;Yiqin Luo","doi":"10.1016/j.patcog.2025.112023","DOIUrl":null,"url":null,"abstract":"<div><div>Facial expression recognition plays a crucial role in understanding human emotions and behavior. However, existing models often exhibit biases and imbalance towards diverse expression classes. To address this problem, we propose an Adaptive Mask-Guide Supervised Network (AMGSN) to enhance the uniform performance of the facial expression recognition models. We propose an adaptive mask guidance mechanism to mitigate bias and ensure uniform performance across different expression classes. AMGSN focuses on learning the ability to distinguish facial features with under-expressed expressions by dynamically generating masks during pre-training. Specifically, we employ an asymmetric encoder–decoder architecture, where the encoder encodes only the unmasked visible regions, while the lightweight decoder reconstructs the original image using latent representations and mask markers. By utilizing dynamically generated masks and focusing on informative regions, these models effectively reduce the interference of confounding factors, thus enhancing the discriminative power of the learned representation. In the pre-training stage, we introduce the Attention-Based Mask Generator (ABMG) to identify salient regions of expressions. Additionally, we advance the Mask Ratio Update Strategy (MRUS), which utilizes image reconstruction loss, to adjust the mask ratio for each image during pre-training. In the finetune stage, debiased center loss and contrastive loss are introduced to optimize the network to ensure the overall performance of expression recognition. Extensive experimental results on several standard datasets demonstrate that the proposed AMGSN significantly improves both balance and accuracy compared to state-of-the-art methods. For example, AMGSN reached 89.34% on RAF-DB, and 62.83% on AffectNet, respectively, with a standard deviation of only 0.0746 and 0.0484. This demonstrates the effectiveness of our improvements<span><span><sup>1</sup></span></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112023"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AMGSN: Adaptive mask-guide supervised network for debiased facial expression recognition\",\"authors\":\"Tianlong Gu,&nbsp;Hao Li,&nbsp;Xuan Feng,&nbsp;Yiqin Luo\",\"doi\":\"10.1016/j.patcog.2025.112023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Facial expression recognition plays a crucial role in understanding human emotions and behavior. However, existing models often exhibit biases and imbalance towards diverse expression classes. To address this problem, we propose an Adaptive Mask-Guide Supervised Network (AMGSN) to enhance the uniform performance of the facial expression recognition models. We propose an adaptive mask guidance mechanism to mitigate bias and ensure uniform performance across different expression classes. AMGSN focuses on learning the ability to distinguish facial features with under-expressed expressions by dynamically generating masks during pre-training. Specifically, we employ an asymmetric encoder–decoder architecture, where the encoder encodes only the unmasked visible regions, while the lightweight decoder reconstructs the original image using latent representations and mask markers. By utilizing dynamically generated masks and focusing on informative regions, these models effectively reduce the interference of confounding factors, thus enhancing the discriminative power of the learned representation. In the pre-training stage, we introduce the Attention-Based Mask Generator (ABMG) to identify salient regions of expressions. Additionally, we advance the Mask Ratio Update Strategy (MRUS), which utilizes image reconstruction loss, to adjust the mask ratio for each image during pre-training. In the finetune stage, debiased center loss and contrastive loss are introduced to optimize the network to ensure the overall performance of expression recognition. Extensive experimental results on several standard datasets demonstrate that the proposed AMGSN significantly improves both balance and accuracy compared to state-of-the-art methods. For example, AMGSN reached 89.34% on RAF-DB, and 62.83% on AffectNet, respectively, with a standard deviation of only 0.0746 and 0.0484. This demonstrates the effectiveness of our improvements<span><span><sup>1</sup></span></span>.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"170 \",\"pages\":\"Article 112023\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325006831\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325006831","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

面部表情识别在理解人类情绪和行为方面起着至关重要的作用。然而,现有的模型往往表现出对不同表达类别的偏见和不平衡。为了解决这个问题,我们提出了一种自适应掩码引导监督网络(AMGSN)来增强面部表情识别模型的一致性。我们提出了一种自适应掩码引导机制,以减轻偏见,并确保在不同的表达类中表现一致。AMGSN的重点是通过在预训练过程中动态生成面具来学习区分面部特征和表达不足的能力。具体来说,我们采用了非对称编码器-解码器架构,其中编码器仅编码未被掩盖的可见区域,而轻量级解码器使用潜在表示和掩码标记重建原始图像。这些模型利用动态生成的掩模,关注信息区域,有效地减少了混杂因素的干扰,从而增强了学习表征的判别能力。在预训练阶段,我们引入了基于注意力的面具生成器(ABMG)来识别表情的显著区域。此外,我们提出了Mask Ratio Update Strategy (MRUS),利用图像重建损失在预训练期间调整每个图像的Mask Ratio。在微调阶段,引入去偏中心损失和对比损失对网络进行优化,保证表情识别的整体性能。在多个标准数据集上的大量实验结果表明,与目前的方法相比,所提出的AMGSN在平衡和精度方面都有显著提高。例如,AMGSN在RAF-DB上达到89.34%,在AffectNet上达到62.83%,标准差仅为0.0746和0.0484。这证明了我们改进的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
AMGSN: Adaptive mask-guide supervised network for debiased facial expression recognition
Facial expression recognition plays a crucial role in understanding human emotions and behavior. However, existing models often exhibit biases and imbalance towards diverse expression classes. To address this problem, we propose an Adaptive Mask-Guide Supervised Network (AMGSN) to enhance the uniform performance of the facial expression recognition models. We propose an adaptive mask guidance mechanism to mitigate bias and ensure uniform performance across different expression classes. AMGSN focuses on learning the ability to distinguish facial features with under-expressed expressions by dynamically generating masks during pre-training. Specifically, we employ an asymmetric encoder–decoder architecture, where the encoder encodes only the unmasked visible regions, while the lightweight decoder reconstructs the original image using latent representations and mask markers. By utilizing dynamically generated masks and focusing on informative regions, these models effectively reduce the interference of confounding factors, thus enhancing the discriminative power of the learned representation. In the pre-training stage, we introduce the Attention-Based Mask Generator (ABMG) to identify salient regions of expressions. Additionally, we advance the Mask Ratio Update Strategy (MRUS), which utilizes image reconstruction loss, to adjust the mask ratio for each image during pre-training. In the finetune stage, debiased center loss and contrastive loss are introduced to optimize the network to ensure the overall performance of expression recognition. Extensive experimental results on several standard datasets demonstrate that the proposed AMGSN significantly improves both balance and accuracy compared to state-of-the-art methods. For example, AMGSN reached 89.34% on RAF-DB, and 62.83% on AffectNet, respectively, with a standard deviation of only 0.0746 and 0.0484. This demonstrates the effectiveness of our improvements1.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信