{"title":"A novel facial expression recognition model based on harnessing complementary features in multi-scale network with attention fusion","authors":"","doi":"10.1016/j.imavis.2024.105183","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch multi-scale attention module (MSA), feature complementing module (FCM), and attention fusion and classification module. The MSA module contains multi-scale attention modules in a cascaded fashion in two paths to learn diverse features. The upper and lower paths use left and right multi-scale blocks to extract and aggregate the features at different receptive fields. The attention networks in MSA focus on salient local regions to extract features at granular levels. The FCM uses the correlation between the feature maps in two paths to make the multi-scale attention features complementary to each other. Finally, the complementary features are fused through an attention network to form an informative holistic feature which includes subtle, visually varying regions in similar classes. Hence, complementary and informative features are used in classification to minimize information loss and capture the discriminating finer aspects of facial expression recognition. Experimental evaluation of the proposed model carried out on AffectNet and CK<!--> <!-->+ datasets achieve accuracies of 64.59% and 98.98%, respectively, outperforming some of the state-of-the-art methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624002889","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch multi-scale attention module (MSA), feature complementing module (FCM), and attention fusion and classification module. The MSA module contains multi-scale attention modules in a cascaded fashion in two paths to learn diverse features. The upper and lower paths use left and right multi-scale blocks to extract and aggregate the features at different receptive fields. The attention networks in MSA focus on salient local regions to extract features at granular levels. The FCM uses the correlation between the feature maps in two paths to make the multi-scale attention features complementary to each other. Finally, the complementary features are fused through an attention network to form an informative holistic feature which includes subtle, visually varying regions in similar classes. Hence, complementary and informative features are used in classification to minimize information loss and capture the discriminating finer aspects of facial expression recognition. Experimental evaluation of the proposed model carried out on AffectNet and CK + datasets achieve accuracies of 64.59% and 98.98%, respectively, outperforming some of the state-of-the-art methods.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.