SG-TE: Spatial Guidance and Temporal Enhancement Network for Facial-Bodily Emotion Recognition

IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhong Huang, Danni Zhang, Fuji Ren, Min Hu, Juan Liu, Haitao Yu
{"title":"SG-TE: Spatial Guidance and Temporal Enhancement Network for Facial-Bodily Emotion Recognition","authors":"Zhong Huang,&nbsp;Danni Zhang,&nbsp;Fuji Ren,&nbsp;Min Hu,&nbsp;Juan Liu,&nbsp;Haitao Yu","doi":"10.1049/cit2.70006","DOIUrl":null,"url":null,"abstract":"<p>To overcome the deficiencies of single-modal emotion recognition based on facial expression or bodily posture in natural scenes, a spatial guidance and temporal enhancement (SG-TE) network is proposed for facial-bodily emotion recognition. First, ResNet50, DNN and spatial ransformer models are used to capture facial texture vectors, bodily skeleton vectors and whole-body geometric vectors, and an intraframe correlation attention guidance (S-CAG) mechanism, which guides the facial texture vector and the bodily skeleton vector by the whole-body geometric vector, is designed to exploit the spatial potential emotional correlation between face and posture. Second, an interframe significant segment enhancement (T-SSE) structure is embedded into a temporal transformer to enhance high emotional intensity frame information and avoid emotional asynchrony. Finally, an adaptive weight assignment (M-AWA) strategy is constructed to realise facial-bodily fusion. The experimental results on the BabyRobot Emotion Dataset (BRED) and Context-Aware Emotion Recognition (CAER) dataset indicate that the proposed network reaches accuracies of 81.61% and 89.39%, which are 9.61% and 9.46% higher than those of the baseline network, respectively. Compared with the state-of-the-art methods, the proposed method achieves 7.73% and 20.57% higher accuracy than single-modal methods based on facial expression or bodily posture, respectively, and 2.16% higher accuracy than the dual-modal methods based on facial-bodily fusion. Therefore, the proposed method, which adaptively fuses the complementary information of face and posture, improves the quality of emotion recognition in real-world scenarios.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"871-890"},"PeriodicalIF":8.4000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70006","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.70006","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

To overcome the deficiencies of single-modal emotion recognition based on facial expression or bodily posture in natural scenes, a spatial guidance and temporal enhancement (SG-TE) network is proposed for facial-bodily emotion recognition. First, ResNet50, DNN and spatial ransformer models are used to capture facial texture vectors, bodily skeleton vectors and whole-body geometric vectors, and an intraframe correlation attention guidance (S-CAG) mechanism, which guides the facial texture vector and the bodily skeleton vector by the whole-body geometric vector, is designed to exploit the spatial potential emotional correlation between face and posture. Second, an interframe significant segment enhancement (T-SSE) structure is embedded into a temporal transformer to enhance high emotional intensity frame information and avoid emotional asynchrony. Finally, an adaptive weight assignment (M-AWA) strategy is constructed to realise facial-bodily fusion. The experimental results on the BabyRobot Emotion Dataset (BRED) and Context-Aware Emotion Recognition (CAER) dataset indicate that the proposed network reaches accuracies of 81.61% and 89.39%, which are 9.61% and 9.46% higher than those of the baseline network, respectively. Compared with the state-of-the-art methods, the proposed method achieves 7.73% and 20.57% higher accuracy than single-modal methods based on facial expression or bodily posture, respectively, and 2.16% higher accuracy than the dual-modal methods based on facial-bodily fusion. Therefore, the proposed method, which adaptively fuses the complementary information of face and posture, improves the quality of emotion recognition in real-world scenarios.

Abstract Image

面部-身体情感识别的空间引导和时间增强网络
为克服自然场景中基于面部表情或身体姿态的单模态情绪识别的不足,提出了一种基于空间引导和时间增强(SG-TE)的面部-身体情绪识别网络。首先,利用ResNet50、DNN和空间变换模型捕获人脸纹理向量、身体骨架向量和全身几何向量,设计框架内相关注意引导(S-CAG)机制,利用人脸与姿态之间的空间潜在情感关联,利用全身几何向量引导人脸纹理向量和身体骨架向量;其次,在时序转换器中嵌入帧间显著段增强(T-SSE)结构,增强高情绪强度帧信息,避免情绪不同步。最后,构造了一种自适应权重分配(M-AWA)策略,实现了面部与身体的融合。在BabyRobot情感数据集(BRED)和情境感知情感识别(CAER)数据集上的实验结果表明,该网络的准确率分别为81.61%和89.39%,比基线网络分别提高了9.61%和9.46%。与现有方法相比,该方法的准确率分别比基于面部表情和身体姿势的单模态方法高7.73%和20.57%,比基于面部和身体融合的双模态方法高2.16%。因此,该方法自适应地融合了人脸和姿态的互补信息,提高了真实场景下的情绪识别质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CAAI Transactions on Intelligence Technology
CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
11.00
自引率
3.90%
发文量
134
审稿时长
35 weeks
期刊介绍: CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信