GSCL-RVT: Generalized supervised contrastive learning with global–local feature fusion for micro-expression recognition

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2025-06-17 DOI:10.1016/j.patrec.2025.05.027

Fan Song, Junhua Li, Zhengxiu Li, Ming Li

{"title":"GSCL-RVT: Generalized supervised contrastive learning with global–local feature fusion for micro-expression recognition","authors":"Fan Song, Junhua Li, Zhengxiu Li, Ming Li","doi":"10.1016/j.patrec.2025.05.027","DOIUrl":null,"url":null,"abstract":"<div><div>Micro-expressions (MEs) are instantaneous facial expressions that appear quickly after an emotionally evocative event and are difficult to suppress, and they can reveal one’s genuine feelings and emotions. With their spontaneous and transient nature, MEs provide a unique perspective for sentiment analysis. However, their subtle and transient nature, coupled with the scarcity and lack of diversity of existing datasets, brings great challenges in discriminative feature learning and model generalization. To address these issues, this paper proposes a novel micro-expression recognition (MER) framework. This framework integrates a feature fusion network by blending residual blocks with a vision transformer (RVT), which can capture local details and integrate global contextual information in images across multiple levels. Furthermore, a generalized supervised contrastive learning (GSCL) strategy is introduced in this paper, wherein traditional one-hot labels are transformed into mixed labels. This strategy then proceeds to compare the similarity between the mixed labels and anchors, with the aim of minimizing the cross-entropy between the label similarity and the potential similarity. This approach aims to optimize the semantic spatial metrics between different MEs and enhance the model’s feature learning capabilities. In addition, we propose a method for augmenting data through region substitution, based on the local features of samples belonging to the same category. This approach works synergistically with a generalized supervised contrastive learning framework, with the objective of addressing the issue of limited micro-expression (ME) data availability. Lastly, we conduct a series of experiments with both Single Database Evaluation (SDE) and Composite Database Evaluation (CDE) protocols, obtaining either optimal or near-optimal results. We also provide sufficiently interpretable analyses to demonstrate the superiority and effectiveness of our proposed methodology.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 169-176"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002284","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Micro-expressions (MEs) are instantaneous facial expressions that appear quickly after an emotionally evocative event and are difficult to suppress, and they can reveal one’s genuine feelings and emotions. With their spontaneous and transient nature, MEs provide a unique perspective for sentiment analysis. However, their subtle and transient nature, coupled with the scarcity and lack of diversity of existing datasets, brings great challenges in discriminative feature learning and model generalization. To address these issues, this paper proposes a novel micro-expression recognition (MER) framework. This framework integrates a feature fusion network by blending residual blocks with a vision transformer (RVT), which can capture local details and integrate global contextual information in images across multiple levels. Furthermore, a generalized supervised contrastive learning (GSCL) strategy is introduced in this paper, wherein traditional one-hot labels are transformed into mixed labels. This strategy then proceeds to compare the similarity between the mixed labels and anchors, with the aim of minimizing the cross-entropy between the label similarity and the potential similarity. This approach aims to optimize the semantic spatial metrics between different MEs and enhance the model’s feature learning capabilities. In addition, we propose a method for augmenting data through region substitution, based on the local features of samples belonging to the same category. This approach works synergistically with a generalized supervised contrastive learning framework, with the objective of addressing the issue of limited micro-expression (ME) data availability. Lastly, we conduct a series of experiments with both Single Database Evaluation (SDE) and Composite Database Evaluation (CDE) protocols, obtaining either optimal or near-optimal results. We also provide sufficiently interpretable analyses to demonstrate the superiority and effectiveness of our proposed methodology.

查看原文本刊更多论文

基于全局-局部特征融合的广义监督对比学习微表情识别

微表情（micro -expression, MEs）是指在情感唤起事件发生后迅速出现的、难以抑制的瞬间面部表情，它们可以揭示一个人的真实感受和情绪。微信具有自发性和短暂性，为情感分析提供了独特的视角。然而，它们的微妙和短暂性，加上现有数据集的稀缺性和缺乏多样性，给判别特征学习和模型泛化带来了很大的挑战。为了解决这些问题，本文提出了一种新的微表情识别框架。该框架将残差块与视觉变换（RVT）相结合，形成特征融合网络，可以在多层图像中捕捉局部细节并整合全局上下文信息。在此基础上，提出了一种广义监督对比学习（GSCL）策略，将传统的单热标签转化为混合标签。该策略接着比较混合标签和锚点之间的相似度，目的是最小化标签相似度和潜在相似度之间的交叉熵。该方法旨在优化不同MEs之间的语义空间度量，增强模型的特征学习能力。此外，我们提出了一种基于属于同一类别的样本的局部特征，通过区域替换来增强数据的方法。该方法与广义监督对比学习框架协同工作，目的是解决微表情（ME）数据可用性有限的问题。最后，我们对单数据库评估（SDE）和复合数据库评估（CDE）协议进行了一系列实验，获得了最优或接近最优的结果。我们还提供了充分的可解释分析，以证明我们提出的方法的优越性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.