基于全局-局部特征融合的广义监督对比学习微表情识别

IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Fan Song, Junhua Li, Zhengxiu Li, Ming Li
{"title":"基于全局-局部特征融合的广义监督对比学习微表情识别","authors":"Fan Song,&nbsp;Junhua Li,&nbsp;Zhengxiu Li,&nbsp;Ming Li","doi":"10.1016/j.patrec.2025.05.027","DOIUrl":null,"url":null,"abstract":"<div><div>Micro-expressions (MEs) are instantaneous facial expressions that appear quickly after an emotionally evocative event and are difficult to suppress, and they can reveal one’s genuine feelings and emotions. With their spontaneous and transient nature, MEs provide a unique perspective for sentiment analysis. However, their subtle and transient nature, coupled with the scarcity and lack of diversity of existing datasets, brings great challenges in discriminative feature learning and model generalization. To address these issues, this paper proposes a novel micro-expression recognition (MER) framework. This framework integrates a feature fusion network by blending residual blocks with a vision transformer (RVT), which can capture local details and integrate global contextual information in images across multiple levels. Furthermore, a generalized supervised contrastive learning (GSCL) strategy is introduced in this paper, wherein traditional one-hot labels are transformed into mixed labels. This strategy then proceeds to compare the similarity between the mixed labels and anchors, with the aim of minimizing the cross-entropy between the label similarity and the potential similarity. This approach aims to optimize the semantic spatial metrics between different MEs and enhance the model’s feature learning capabilities. In addition, we propose a method for augmenting data through region substitution, based on the local features of samples belonging to the same category. This approach works synergistically with a generalized supervised contrastive learning framework, with the objective of addressing the issue of limited micro-expression (ME) data availability. Lastly, we conduct a series of experiments with both Single Database Evaluation (SDE) and Composite Database Evaluation (CDE) protocols, obtaining either optimal or near-optimal results. We also provide sufficiently interpretable analyses to demonstrate the superiority and effectiveness of our proposed methodology.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 169-176"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GSCL-RVT: Generalized supervised contrastive learning with global–local feature fusion for micro-expression recognition\",\"authors\":\"Fan Song,&nbsp;Junhua Li,&nbsp;Zhengxiu Li,&nbsp;Ming Li\",\"doi\":\"10.1016/j.patrec.2025.05.027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Micro-expressions (MEs) are instantaneous facial expressions that appear quickly after an emotionally evocative event and are difficult to suppress, and they can reveal one’s genuine feelings and emotions. With their spontaneous and transient nature, MEs provide a unique perspective for sentiment analysis. However, their subtle and transient nature, coupled with the scarcity and lack of diversity of existing datasets, brings great challenges in discriminative feature learning and model generalization. To address these issues, this paper proposes a novel micro-expression recognition (MER) framework. This framework integrates a feature fusion network by blending residual blocks with a vision transformer (RVT), which can capture local details and integrate global contextual information in images across multiple levels. Furthermore, a generalized supervised contrastive learning (GSCL) strategy is introduced in this paper, wherein traditional one-hot labels are transformed into mixed labels. This strategy then proceeds to compare the similarity between the mixed labels and anchors, with the aim of minimizing the cross-entropy between the label similarity and the potential similarity. This approach aims to optimize the semantic spatial metrics between different MEs and enhance the model’s feature learning capabilities. In addition, we propose a method for augmenting data through region substitution, based on the local features of samples belonging to the same category. This approach works synergistically with a generalized supervised contrastive learning framework, with the objective of addressing the issue of limited micro-expression (ME) data availability. Lastly, we conduct a series of experiments with both Single Database Evaluation (SDE) and Composite Database Evaluation (CDE) protocols, obtaining either optimal or near-optimal results. We also provide sufficiently interpretable analyses to demonstrate the superiority and effectiveness of our proposed methodology.</div></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"196 \",\"pages\":\"Pages 169-176\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865525002284\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002284","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

微表情(micro -expression, MEs)是指在情感唤起事件发生后迅速出现的、难以抑制的瞬间面部表情,它们可以揭示一个人的真实感受和情绪。微信具有自发性和短暂性,为情感分析提供了独特的视角。然而,它们的微妙和短暂性,加上现有数据集的稀缺性和缺乏多样性,给判别特征学习和模型泛化带来了很大的挑战。为了解决这些问题,本文提出了一种新的微表情识别框架。该框架将残差块与视觉变换(RVT)相结合,形成特征融合网络,可以在多层图像中捕捉局部细节并整合全局上下文信息。在此基础上,提出了一种广义监督对比学习(GSCL)策略,将传统的单热标签转化为混合标签。该策略接着比较混合标签和锚点之间的相似度,目的是最小化标签相似度和潜在相似度之间的交叉熵。该方法旨在优化不同MEs之间的语义空间度量,增强模型的特征学习能力。此外,我们提出了一种基于属于同一类别的样本的局部特征,通过区域替换来增强数据的方法。该方法与广义监督对比学习框架协同工作,目的是解决微表情(ME)数据可用性有限的问题。最后,我们对单数据库评估(SDE)和复合数据库评估(CDE)协议进行了一系列实验,获得了最优或接近最优的结果。我们还提供了充分的可解释分析,以证明我们提出的方法的优越性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GSCL-RVT: Generalized supervised contrastive learning with global–local feature fusion for micro-expression recognition
Micro-expressions (MEs) are instantaneous facial expressions that appear quickly after an emotionally evocative event and are difficult to suppress, and they can reveal one’s genuine feelings and emotions. With their spontaneous and transient nature, MEs provide a unique perspective for sentiment analysis. However, their subtle and transient nature, coupled with the scarcity and lack of diversity of existing datasets, brings great challenges in discriminative feature learning and model generalization. To address these issues, this paper proposes a novel micro-expression recognition (MER) framework. This framework integrates a feature fusion network by blending residual blocks with a vision transformer (RVT), which can capture local details and integrate global contextual information in images across multiple levels. Furthermore, a generalized supervised contrastive learning (GSCL) strategy is introduced in this paper, wherein traditional one-hot labels are transformed into mixed labels. This strategy then proceeds to compare the similarity between the mixed labels and anchors, with the aim of minimizing the cross-entropy between the label similarity and the potential similarity. This approach aims to optimize the semantic spatial metrics between different MEs and enhance the model’s feature learning capabilities. In addition, we propose a method for augmenting data through region substitution, based on the local features of samples belonging to the same category. This approach works synergistically with a generalized supervised contrastive learning framework, with the objective of addressing the issue of limited micro-expression (ME) data availability. Lastly, we conduct a series of experiments with both Single Database Evaluation (SDE) and Composite Database Evaluation (CDE) protocols, obtaining either optimal or near-optimal results. We also provide sufficiently interpretable analyses to demonstrate the superiority and effectiveness of our proposed methodology.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信