{"title":"GSCL-RVT: Generalized supervised contrastive learning with global–local feature fusion for micro-expression recognition","authors":"Fan Song, Junhua Li, Zhengxiu Li, Ming Li","doi":"10.1016/j.patrec.2025.05.027","DOIUrl":null,"url":null,"abstract":"<div><div>Micro-expressions (MEs) are instantaneous facial expressions that appear quickly after an emotionally evocative event and are difficult to suppress, and they can reveal one’s genuine feelings and emotions. With their spontaneous and transient nature, MEs provide a unique perspective for sentiment analysis. However, their subtle and transient nature, coupled with the scarcity and lack of diversity of existing datasets, brings great challenges in discriminative feature learning and model generalization. To address these issues, this paper proposes a novel micro-expression recognition (MER) framework. This framework integrates a feature fusion network by blending residual blocks with a vision transformer (RVT), which can capture local details and integrate global contextual information in images across multiple levels. Furthermore, a generalized supervised contrastive learning (GSCL) strategy is introduced in this paper, wherein traditional one-hot labels are transformed into mixed labels. This strategy then proceeds to compare the similarity between the mixed labels and anchors, with the aim of minimizing the cross-entropy between the label similarity and the potential similarity. This approach aims to optimize the semantic spatial metrics between different MEs and enhance the model’s feature learning capabilities. In addition, we propose a method for augmenting data through region substitution, based on the local features of samples belonging to the same category. This approach works synergistically with a generalized supervised contrastive learning framework, with the objective of addressing the issue of limited micro-expression (ME) data availability. Lastly, we conduct a series of experiments with both Single Database Evaluation (SDE) and Composite Database Evaluation (CDE) protocols, obtaining either optimal or near-optimal results. We also provide sufficiently interpretable analyses to demonstrate the superiority and effectiveness of our proposed methodology.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 169-176"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002284","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Micro-expressions (MEs) are instantaneous facial expressions that appear quickly after an emotionally evocative event and are difficult to suppress, and they can reveal one’s genuine feelings and emotions. With their spontaneous and transient nature, MEs provide a unique perspective for sentiment analysis. However, their subtle and transient nature, coupled with the scarcity and lack of diversity of existing datasets, brings great challenges in discriminative feature learning and model generalization. To address these issues, this paper proposes a novel micro-expression recognition (MER) framework. This framework integrates a feature fusion network by blending residual blocks with a vision transformer (RVT), which can capture local details and integrate global contextual information in images across multiple levels. Furthermore, a generalized supervised contrastive learning (GSCL) strategy is introduced in this paper, wherein traditional one-hot labels are transformed into mixed labels. This strategy then proceeds to compare the similarity between the mixed labels and anchors, with the aim of minimizing the cross-entropy between the label similarity and the potential similarity. This approach aims to optimize the semantic spatial metrics between different MEs and enhance the model’s feature learning capabilities. In addition, we propose a method for augmenting data through region substitution, based on the local features of samples belonging to the same category. This approach works synergistically with a generalized supervised contrastive learning framework, with the objective of addressing the issue of limited micro-expression (ME) data availability. Lastly, we conduct a series of experiments with both Single Database Evaluation (SDE) and Composite Database Evaluation (CDE) protocols, obtaining either optimal or near-optimal results. We also provide sufficiently interpretable analyses to demonstrate the superiority and effectiveness of our proposed methodology.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.