An Enhanced Adaptive Confidence Margin for Semi-Supervised Facial Expression Recognition.

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-22 DOI:10.1109/tpami.2025.3612953

Hangyu Li,Nannan Wang,Xi Yang,Xiaoyu Wang,Xinbo Gao

{"title":"An Enhanced Adaptive Confidence Margin for Semi-Supervised Facial Expression Recognition.","authors":"Hangyu Li,Nannan Wang,Xi Yang,Xiaoyu Wang,Xinbo Gao","doi":"10.1109/tpami.2025.3612953","DOIUrl":null,"url":null,"abstract":"Semi-supervised learning (SSL) provides a practical framework for leveraging massive unlabeled samples, especially when labels are expensive for facial expression recognition (FER). Typical SSL methods like FixMatch select unlabeled samples with confidence scores above a fixed threshold for training. However, these methods face two primary limitations: failing to consider the varying confidence across facial expression categories and failing to utilize unlabeled facial expression samples efficiently. To address these challenges, we propose an Enhanced Adaptive Confidence Margin (EACM), consisting of dynamic thresholds for different categories, to fully learn unlabeled samples. Specifically, we employ the predictions on labeled samples at each training iteration to learn an EACM. It then partitions unlabeled samples into two subsets: (1) subset I, including samples whose confidence scores are no less than the margin; (2) subset II, including samples whose confidence scores are less than the margin. For samples in subset I, we constrain their predictions on strongly-augmented versions to match the pseudo-labels derived from the predictions on weakly-augmented versions. Meanwhile, we introduce a feature-level contrastive objective to enhance the similarity between two weakly-augmented features of a sample in subset II. We extensively evaluate EACM on image-based and video-based facial expression datasets, showing that our method achieves superior performance, significantly surpassing fully-supervised baselines in a semi-supervised manner. Additionally, our EACM is promising to leverage cross-dataset unlabeled samples for practical training to boost fully-supervised performance. The source code is made publicly available at https://github.com/hangyu94/Ada-CM/tree/main/Journal.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"61 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3612953","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Semi-supervised learning (SSL) provides a practical framework for leveraging massive unlabeled samples, especially when labels are expensive for facial expression recognition (FER). Typical SSL methods like FixMatch select unlabeled samples with confidence scores above a fixed threshold for training. However, these methods face two primary limitations: failing to consider the varying confidence across facial expression categories and failing to utilize unlabeled facial expression samples efficiently. To address these challenges, we propose an Enhanced Adaptive Confidence Margin (EACM), consisting of dynamic thresholds for different categories, to fully learn unlabeled samples. Specifically, we employ the predictions on labeled samples at each training iteration to learn an EACM. It then partitions unlabeled samples into two subsets: (1) subset I, including samples whose confidence scores are no less than the margin; (2) subset II, including samples whose confidence scores are less than the margin. For samples in subset I, we constrain their predictions on strongly-augmented versions to match the pseudo-labels derived from the predictions on weakly-augmented versions. Meanwhile, we introduce a feature-level contrastive objective to enhance the similarity between two weakly-augmented features of a sample in subset II. We extensively evaluate EACM on image-based and video-based facial expression datasets, showing that our method achieves superior performance, significantly surpassing fully-supervised baselines in a semi-supervised manner. Additionally, our EACM is promising to leverage cross-dataset unlabeled samples for practical training to boost fully-supervised performance. The source code is made publicly available at https://github.com/hangyu94/Ada-CM/tree/main/Journal.

查看原文本刊更多论文

半监督面部表情识别的增强自适应置信度。

半监督学习（SSL）为利用大量未标记的样本提供了一个实用的框架，特别是当面部表情识别（FER）的标签昂贵时。FixMatch等典型SSL方法选择置信度得分高于固定阈值的未标记样本进行训练。然而，这些方法面临两个主要的局限性：未能考虑面部表情类别的不同置信度，未能有效地利用未标记的面部表情样本。为了解决这些挑战，我们提出了一种增强自适应置信边际（EACM），由不同类别的动态阈值组成，以充分学习未标记的样本。具体来说，我们在每次训练迭代中使用对标记样本的预测来学习EACM。然后将未标记的样本分成两个子集：(1)子集I，包括置信度分数不小于边际的样本；(2)子集II，包括置信度分数小于边际的样本。对于子集I中的样本，我们约束它们对强增广版本的预测，以匹配从弱增广版本的预测中得到的伪标签。同时，我们引入特征级对比目标来增强子集II中样本的两个弱增强特征之间的相似性。我们在基于图像和基于视频的面部表情数据集上广泛评估了EACM，表明我们的方法取得了优异的性能，显著优于半监督方式的全监督基线。此外，我们的EACM有望利用跨数据集未标记的样本进行实际训练，以提高完全监督的性能。源代码可以在https://github.com/hangyu94/Ada-CM/tree/main/Journal上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.