Zhong Huang, Danni Zhang, Fuji Ren, Min Hu, Liu Juan
{"title":"Emotion Recognition Method based on Guided Fusion of Facial Expression and Bodily Posture","authors":"Zhong Huang, Danni Zhang, Fuji Ren, Min Hu, Liu Juan","doi":"10.1109/ccis57298.2022.10016324","DOIUrl":null,"url":null,"abstract":"Aiming at single modality video emotion recognition of face or body is easily affected by occlusion, angle deflection, and low emotional intensity, we propose an emotion recognition method based on guided fusion of facial expression and bodily posture (GF-FB). Firstly, Resnet50 and DNN are used to obtain intra-frame facial texture vector and bodily skeleton vector. Meanwhile, the whole-body geometric feature captured by the transformer encoder, is guided to obtain facial enhancement vector and bodily enhancement vector by the vectors of two modalities, respectively. Then, an inter-frame time encoder is designed to describe spatio-temporal features of facial enhancement sequence and bodily enhancement sequence. Finally, the heterogeneous features adaptive fusion module is constructed to realize the weight allocation of facial enhancement branch and bodily enhancement branch. Experimental results on the BabyRobot Emotion Dataset show that the accuracy of proposed method reaches 78.22%, which is 6.22% higher than baseline network.","PeriodicalId":374660,"journal":{"name":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"2018 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ccis57298.2022.10016324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aiming at single modality video emotion recognition of face or body is easily affected by occlusion, angle deflection, and low emotional intensity, we propose an emotion recognition method based on guided fusion of facial expression and bodily posture (GF-FB). Firstly, Resnet50 and DNN are used to obtain intra-frame facial texture vector and bodily skeleton vector. Meanwhile, the whole-body geometric feature captured by the transformer encoder, is guided to obtain facial enhancement vector and bodily enhancement vector by the vectors of two modalities, respectively. Then, an inter-frame time encoder is designed to describe spatio-temporal features of facial enhancement sequence and bodily enhancement sequence. Finally, the heterogeneous features adaptive fusion module is constructed to realize the weight allocation of facial enhancement branch and bodily enhancement branch. Experimental results on the BabyRobot Emotion Dataset show that the accuracy of proposed method reaches 78.22%, which is 6.22% higher than baseline network.