基于注意力融合的多尺度网络互补特征的新型面部表情识别模型

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2024-07-14 DOI:10.1016/j.imavis.2024.105183

{"title":"基于注意力融合的多尺度网络互补特征的新型面部表情识别模型","authors":"","doi":"10.1016/j.imavis.2024.105183","DOIUrl":null,"url":null,"abstract":"<div><p>This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch multi-scale attention module (MSA), feature complementing module (FCM), and attention fusion and classification module. The MSA module contains multi-scale attention modules in a cascaded fashion in two paths to learn diverse features. The upper and lower paths use left and right multi-scale blocks to extract and aggregate the features at different receptive fields. The attention networks in MSA focus on salient local regions to extract features at granular levels. The FCM uses the correlation between the feature maps in two paths to make the multi-scale attention features complementary to each other. Finally, the complementary features are fused through an attention network to form an informative holistic feature which includes subtle, visually varying regions in similar classes. Hence, complementary and informative features are used in classification to minimize information loss and capture the discriminating finer aspects of facial expression recognition. Experimental evaluation of the proposed model carried out on AffectNet and CK+ datasets achieve accuracies of 64.59% and 98.98%, respectively, outperforming some of the state-of-the-art methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel facial expression recognition model based on harnessing complementary features in multi-scale network with attention fusion\",\"authors\":\"\",\"doi\":\"10.1016/j.imavis.2024.105183\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch multi-scale attention module (MSA), feature complementing module (FCM), and attention fusion and classification module. The MSA module contains multi-scale attention modules in a cascaded fashion in two paths to learn diverse features. The upper and lower paths use left and right multi-scale blocks to extract and aggregate the features at different receptive fields. The attention networks in MSA focus on salient local regions to extract features at granular levels. The FCM uses the correlation between the feature maps in two paths to make the multi-scale attention features complementary to each other. Finally, the complementary features are fused through an attention network to form an informative holistic feature which includes subtle, visually varying regions in similar classes. Hence, complementary and informative features are used in classification to minimize information loss and capture the discriminating finer aspects of facial expression recognition. Experimental evaluation of the proposed model carried out on AffectNet and CK+ datasets achieve accuracies of 64.59% and 98.98%, respectively, outperforming some of the state-of-the-art methods.</p></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624002889\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624002889","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新颖的面部表情识别方法，该方法采用了所提出的特征互补和多尺度注意力融合模型（FCMSA-AF）。该模型由四个主要部分组成：浅层特征提取模块、并行结构双分支多尺度注意力模块（MSA）、特征补充模块（FCM）以及注意力融合和分类模块。MSA 模块包含多尺度注意力模块，以级联方式通过两条路径学习各种特征。上层和下层路径使用左右两个多尺度模块来提取和聚合不同感受野的特征。MSA 中的注意力网络侧重于突出的局部区域，以提取细粒度的特征。FCM 利用两条路径中特征图之间的相关性，使多尺度注意力特征相互补充。最后，通过注意力网络将互补特征融合在一起，形成一个信息丰富的整体特征，其中包括相似类别中细微的、视觉变化的区域。因此，互补性和信息性特征被用于分类，以最大限度地减少信息损失，并捕捉面部表情识别的细微差别。在 AffectNet 和 CK + 数据集上对所提出的模型进行了实验评估，结果表明其准确率分别为 64.59% 和 98.98%，优于一些最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A novel facial expression recognition model based on harnessing complementary features in multi-scale network with attention fusion

This paper presents a novel method for facial expression recognition using the proposed feature complementation and multi-scale attention model with attention fusion (FCMSA-AF). The proposed model consists of four main components: the shallow feature extractor module, parallel structured two-branch multi-scale attention module (MSA), feature complementing module (FCM), and attention fusion and classification module. The MSA module contains multi-scale attention modules in a cascaded fashion in two paths to learn diverse features. The upper and lower paths use left and right multi-scale blocks to extract and aggregate the features at different receptive fields. The attention networks in MSA focus on salient local regions to extract features at granular levels. The FCM uses the correlation between the feature maps in two paths to make the multi-scale attention features complementary to each other. Finally, the complementary features are fused through an attention network to form an informative holistic feature which includes subtle, visually varying regions in similar classes. Hence, complementary and informative features are used in classification to minimize information loss and capture the discriminating finer aspects of facial expression recognition. Experimental evaluation of the proposed model carried out on AffectNet and CK + datasets achieve accuracies of 64.59% and 98.98%, respectively, outperforming some of the state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.