Point-Supervised Facial Expression Spotting With Gaussian-Based Instance-Adaptive Intensity Modeling

IF 5

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2026-03-01 Epub Date: 2026-01-12 DOI:10.1109/TBIOM.2026.3651893

Yicheng Deng;Hideaki Hayashi;Hajime Nagahara

{"title":"Point-Supervised Facial Expression Spotting With Gaussian-Based Instance-Adaptive Intensity Modeling","authors":"Yicheng Deng;Hideaki Hayashi;Hajime Nagahara","doi":"10.1109/TBIOM.2026.3651893","DOIUrl":null,"url":null,"abstract":"Automatic facial expression spotting, which aims to identify facial expression instances in untrimmed videos, is crucial for facial expression analysis. Existing methods primarily focus on fully-supervised learning and rely on costly, time-consuming temporal boundary annotations. In this paper, we investigate point-supervised facial expression spotting (P-FES), where only a single timestamp annotation per instance is required for training. We propose a unique two-branch framework for P-FES. First, to mitigate the limitation of hard pseudo-labeling, which often confuses neutral and expression frames with various intensities, we propose a Gaussian-based instance-adaptive intensity modeling (GIM) module to model instance-level expression intensity distribution for soft pseudo-labeling. By detecting the pseudo-apex frame around each point label, estimating the duration, and constructing an instance-level Gaussian distribution, GIM assigns soft pseudo-labels to expression frames for more reliable intensity supervision. The GIM module is incorporated into our framework to optimize the class-agnostic expression intensity branch. Second, we design a class-aware apex classification branch that distinguishes macro- and micro-expressions solely based on their pseudo-apex frames. During inference, the two branches work independently: the class-agnostic expression intensity branch generates expression proposals, while the class-aware apex-classification branch is responsible for macro- and micro-expression classification. Furthermore, we introduce an intensity-aware contrastive loss to enhance discriminative feature learning and suppress neutral noise by contrasting neutral frames with expression frames with various intensities. Extensive experiments on the SAMM-LV, CAS(ME)2, and CAS(ME)3 datasets demonstrate the effectiveness of our proposed framework. Code is available at <uri>https://github.com/KinopioIsAllIn/GIM</uri>","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 3","pages":"378-391"},"PeriodicalIF":5.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11339514","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11339514/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/12 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic facial expression spotting, which aims to identify facial expression instances in untrimmed videos, is crucial for facial expression analysis. Existing methods primarily focus on fully-supervised learning and rely on costly, time-consuming temporal boundary annotations. In this paper, we investigate point-supervised facial expression spotting (P-FES), where only a single timestamp annotation per instance is required for training. We propose a unique two-branch framework for P-FES. First, to mitigate the limitation of hard pseudo-labeling, which often confuses neutral and expression frames with various intensities, we propose a Gaussian-based instance-adaptive intensity modeling (GIM) module to model instance-level expression intensity distribution for soft pseudo-labeling. By detecting the pseudo-apex frame around each point label, estimating the duration, and constructing an instance-level Gaussian distribution, GIM assigns soft pseudo-labels to expression frames for more reliable intensity supervision. The GIM module is incorporated into our framework to optimize the class-agnostic expression intensity branch. Second, we design a class-aware apex classification branch that distinguishes macro- and micro-expressions solely based on their pseudo-apex frames. During inference, the two branches work independently: the class-agnostic expression intensity branch generates expression proposals, while the class-aware apex-classification branch is responsible for macro- and micro-expression classification. Furthermore, we introduce an intensity-aware contrastive loss to enhance discriminative feature learning and suppress neutral noise by contrasting neutral frames with expression frames with various intensities. Extensive experiments on the SAMM-LV, CAS(ME)2, and CAS(ME)3 datasets demonstrate the effectiveness of our proposed framework. Code is available at https://github.com/KinopioIsAllIn/GIM

查看原文本刊更多论文

基于高斯的实例自适应强度建模的点监督面部表情识别

面部表情自动识别是面部表情分析的关键，其目的是识别未经修剪的视频中的面部表情实例。现有的方法主要集中在全监督学习上，依赖于昂贵、耗时的时间边界标注。在本文中，我们研究了点监督面部表情识别（P-FES），其中每个实例只需要一个时间戳注释即可进行训练。我们提出了一个独特的P-FES双分支框架。首先，为了消除硬伪标记容易混淆中性帧和表达帧强度的局限性，我们提出了基于高斯的实例自适应强度建模（GIM）模块，对软伪标记的实例级表达强度分布进行建模。GIM通过检测每个点标签周围的伪顶点帧，估计持续时间，构建实例级高斯分布，为表达帧分配软伪标签，以实现更可靠的强度监督。GIM模块被整合到我们的框架中，以优化与类无关的表达式强度分支。其次，我们设计了一个类感知的顶点分类分支，该分支仅基于伪顶点框架来区分宏表达式和微表达式。在推理过程中，这两个分支独立工作：类不可知的表达强度分支生成表达建议，而类感知的顶点分类分支负责宏、微表达分类。此外，我们引入了一种强度感知的对比损失，通过将中性帧与不同强度的表达帧进行对比来增强判别特征学习并抑制中性噪声。在SAMM-LV、CAS(ME)2和CAS(ME)3数据集上的大量实验证明了我们提出的框架的有效性。代码可从https://github.com/KinopioIsAllIn/GIM获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on biometrics, behavior, and identity science

CiteScore

10.90

自引率

0.00%

发文量