{"title":"Masked Graph Attention network for classification of facial micro-expression","authors":"Ankith Jain Rakesh Kumar, Bir Bhanu","doi":"10.1016/j.imavis.2025.105584","DOIUrl":null,"url":null,"abstract":"<div><div>Facial micro-expressions (MEs) are ultra-fine, quick, and short-motion muscle movements expressing a person’s true feelings. Automatic recognition of MEs with only a few samples is challenging and the extraction of subtle features becomes crucial. This paper addresses these intricacies and presents a novel dual-branch (branch1 for node locations and branch2 for optical flow patch information) masked graph attention network-based approach (MaskGAT) to classify MEs in a video. It utilizes a three-frame graph structure to extract spatio-temporal information. It learns a mask for each node to eliminate the less important node features and propagates the important node features to the neighboring nodes. A masked self-attention graph pooling layer is designed to provide the attention score to eliminate the unwanted nodes and uses only the nodes with a high attention score. An adaptive frame selection mechanism is designed that is based on a sliding window optical flow method to discard the low-intensity emotion frames. A well-designed dual-branch fusion mechanism is developed to extract informative features for the final classification of MEs. Furthermore, the paper presents a detailed mathematical analysis and visualization of the MaskGAT pipeline to demonstrate the effectiveness of node feature masking and pooling. The results are presented and compared with the state-of-the-art methods for SMIC, SAMM, CASME II, and MMEW databases. Further, cross-dataset experiments are carried out, and the results are reported.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105584"},"PeriodicalIF":4.2000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001726","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Facial micro-expressions (MEs) are ultra-fine, quick, and short-motion muscle movements expressing a person’s true feelings. Automatic recognition of MEs with only a few samples is challenging and the extraction of subtle features becomes crucial. This paper addresses these intricacies and presents a novel dual-branch (branch1 for node locations and branch2 for optical flow patch information) masked graph attention network-based approach (MaskGAT) to classify MEs in a video. It utilizes a three-frame graph structure to extract spatio-temporal information. It learns a mask for each node to eliminate the less important node features and propagates the important node features to the neighboring nodes. A masked self-attention graph pooling layer is designed to provide the attention score to eliminate the unwanted nodes and uses only the nodes with a high attention score. An adaptive frame selection mechanism is designed that is based on a sliding window optical flow method to discard the low-intensity emotion frames. A well-designed dual-branch fusion mechanism is developed to extract informative features for the final classification of MEs. Furthermore, the paper presents a detailed mathematical analysis and visualization of the MaskGAT pipeline to demonstrate the effectiveness of node feature masking and pooling. The results are presented and compared with the state-of-the-art methods for SMIC, SAMM, CASME II, and MMEW databases. Further, cross-dataset experiments are carried out, and the results are reported.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.