Masked Graph Attention network for classification of facial micro-expression

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-05-24 DOI:10.1016/j.imavis.2025.105584

Ankith Jain Rakesh Kumar, Bir Bhanu

{"title":"Masked Graph Attention network for classification of facial micro-expression","authors":"Ankith Jain Rakesh Kumar, Bir Bhanu","doi":"10.1016/j.imavis.2025.105584","DOIUrl":null,"url":null,"abstract":"<div><div>Facial micro-expressions (MEs) are ultra-fine, quick, and short-motion muscle movements expressing a person’s true feelings. Automatic recognition of MEs with only a few samples is challenging and the extraction of subtle features becomes crucial. This paper addresses these intricacies and presents a novel dual-branch (branch1 for node locations and branch2 for optical flow patch information) masked graph attention network-based approach (MaskGAT) to classify MEs in a video. It utilizes a three-frame graph structure to extract spatio-temporal information. It learns a mask for each node to eliminate the less important node features and propagates the important node features to the neighboring nodes. A masked self-attention graph pooling layer is designed to provide the attention score to eliminate the unwanted nodes and uses only the nodes with a high attention score. An adaptive frame selection mechanism is designed that is based on a sliding window optical flow method to discard the low-intensity emotion frames. A well-designed dual-branch fusion mechanism is developed to extract informative features for the final classification of MEs. Furthermore, the paper presents a detailed mathematical analysis and visualization of the MaskGAT pipeline to demonstrate the effectiveness of node feature masking and pooling. The results are presented and compared with the state-of-the-art methods for SMIC, SAMM, CASME II, and MMEW databases. Further, cross-dataset experiments are carried out, and the results are reported.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105584"},"PeriodicalIF":4.2000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001726","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Facial micro-expressions (MEs) are ultra-fine, quick, and short-motion muscle movements expressing a person’s true feelings. Automatic recognition of MEs with only a few samples is challenging and the extraction of subtle features becomes crucial. This paper addresses these intricacies and presents a novel dual-branch (branch1 for node locations and branch2 for optical flow patch information) masked graph attention network-based approach (MaskGAT) to classify MEs in a video. It utilizes a three-frame graph structure to extract spatio-temporal information. It learns a mask for each node to eliminate the less important node features and propagates the important node features to the neighboring nodes. A masked self-attention graph pooling layer is designed to provide the attention score to eliminate the unwanted nodes and uses only the nodes with a high attention score. An adaptive frame selection mechanism is designed that is based on a sliding window optical flow method to discard the low-intensity emotion frames. A well-designed dual-branch fusion mechanism is developed to extract informative features for the final classification of MEs. Furthermore, the paper presents a detailed mathematical analysis and visualization of the MaskGAT pipeline to demonstrate the effectiveness of node feature masking and pooling. The results are presented and compared with the state-of-the-art methods for SMIC, SAMM, CASME II, and MMEW databases. Further, cross-dataset experiments are carried out, and the results are reported.

查看原文本刊更多论文

面部微表情分类的蒙面图注意网络

面部微表情（MEs）是一种超精细、快速、短时间的肌肉运动，表达了一个人的真实感受。仅使用少量样本进行MEs的自动识别具有挑战性，提取细微特征变得至关重要。本文解决了这些复杂性，并提出了一种新的基于双分支（branch1用于节点位置，branch2用于光流补丁信息）的基于掩码图关注网络的方法（MaskGAT）来对视频中的MEs进行分类。它利用三帧图结构提取时空信息。它为每个节点学习一个掩码，以消除不太重要的节点特征，并将重要的节点特征传播给相邻节点。设计了一个掩码自注意图池层，提供注意分数以消除不需要的节点，并且只使用注意分数高的节点。设计了一种基于滑动窗口光流法的自适应帧选择机制，用于丢弃低强度情绪帧。开发了一种设计良好的双分支融合机制来提取信息特征，用于最终的MEs分类。此外，本文还对MaskGAT管道进行了详细的数学分析和可视化，以证明节点特征掩蔽和池化的有效性。并将结果与SMIC、SAMM、CASME II和MMEW数据库的最新方法进行了比较。在此基础上，进行了跨数据集实验，并报告了实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.