Fan Zhang , Zhi-Qi Cheng , Jian Zhao , Xiaojiang Peng , Xuelong Li
{"title":"LEAF:在半监督面部表情识别中揭开同一枚硬币的两面","authors":"Fan Zhang , Zhi-Qi Cheng , Jian Zhao , Xiaojiang Peng , Xuelong Li","doi":"10.1016/j.cviu.2025.104451","DOIUrl":null,"url":null,"abstract":"<div><div>Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily <em>focus on one side of the coin, i.e., generating high-quality pseudo-labels</em>, while <em>overlooking the other side: enhancing expression-relevant representations</em>. In this paper, we <em>unveil both sides of the coin</em> by proposing a <em>unified</em> framework termed hierarchica<u>L</u> d<u>E</u>coupling <u>A</u>nd <u>F</u>using (LEAF) to <em>coordinate</em> expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF <em>decouples</em> representations into expression-agnostic and expression-relevant components, and <em>adaptively fuses</em> them using learnable gating weights. (2) At the category level, LEAF <em>assigns</em> ambiguous pseudo-labels by <em>decoupling</em> predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at <span><span>https://github.com/zfkarl/LEAF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104451"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LEAF: Unveiling two sides of the same coin in semi-supervised facial expression recognition\",\"authors\":\"Fan Zhang , Zhi-Qi Cheng , Jian Zhao , Xiaojiang Peng , Xuelong Li\",\"doi\":\"10.1016/j.cviu.2025.104451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily <em>focus on one side of the coin, i.e., generating high-quality pseudo-labels</em>, while <em>overlooking the other side: enhancing expression-relevant representations</em>. In this paper, we <em>unveil both sides of the coin</em> by proposing a <em>unified</em> framework termed hierarchica<u>L</u> d<u>E</u>coupling <u>A</u>nd <u>F</u>using (LEAF) to <em>coordinate</em> expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF <em>decouples</em> representations into expression-agnostic and expression-relevant components, and <em>adaptively fuses</em> them using learnable gating weights. (2) At the category level, LEAF <em>assigns</em> ambiguous pseudo-labels by <em>decoupling</em> predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at <span><span>https://github.com/zfkarl/LEAF</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"260 \",\"pages\":\"Article 104451\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225001742\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001742","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
LEAF: Unveiling two sides of the same coin in semi-supervised facial expression recognition
Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily focus on one side of the coin, i.e., generating high-quality pseudo-labels, while overlooking the other side: enhancing expression-relevant representations. In this paper, we unveil both sides of the coin by proposing a unified framework termed hierarchicaL dEcoupling And Fusing (LEAF) to coordinate expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF decouples representations into expression-agnostic and expression-relevant components, and adaptively fuses them using learnable gating weights. (2) At the category level, LEAF assigns ambiguous pseudo-labels by decoupling predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at https://github.com/zfkarl/LEAF.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems