Yu-Xiang Chen, Hong-Mei Sun, Cheng-Yue Che, Shuo Feng, Rui-Sheng Jia
{"title":"情境感知情感识别通过代理-场景交互","authors":"Yu-Xiang Chen, Hong-Mei Sun, Cheng-Yue Che, Shuo Feng, Rui-Sheng Jia","doi":"10.1016/j.engappai.2025.111581","DOIUrl":null,"url":null,"abstract":"<div><div>In real-world scenarios, context-aware emotion recognition (CAER) is a key problem in affective computing with broad application prospects. Most current CAER methods primarily rely on image-level contextual features. However, the interactive relationships between the agent and other objects within the scene are often overlooked or only partially modeled, which limits emotion recognition accuracy. To address this, we proposed a spatial interactive context-aware emotion network (ICENet) that consists of an agent feature extraction branch and a scene-context interaction branch. Specifically, the agent feature extraction branch aims to extract facial and posture features from the target agent and fuse them. In the facial feature extraction network named FaceNet, pure Convolutional Neural Network (ConvNeXt) is used as the backbone to extract global features, and a self-attention-based fine-grained feature extraction (FGFE) module is designed to capture more discriminative local features. In the posture feature extraction network, semantic segmentation is used to extract human silhouettes, which are then processed by Vision Transformer to obtain posture-related features. Meanwhile, the scene-context interaction branch named ObjNet integrates agent’s gaze angle and global depth maps to construct target agent-objects relationship in three-dimensional (TAR3D). Subsequently, a Graph Convolutional Network is employed to model the TAR3D and extract scene-context interaction features. Subsequently, a multiplicative fusion strategy is adopted to integrate agent features with scene-context interaction features, and emotion classification is performed based on the fused representation. Finally, experiments on EMOTIC and CAER-S datasets show that our approach outperforms current state-of-the-art methods in classification accuracy. The code is available at <span><span>https://github.com/Cyx336/ICENet.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"158 ","pages":"Article 111581"},"PeriodicalIF":8.0000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Context-aware emotion recognition through agent-scene interactions\",\"authors\":\"Yu-Xiang Chen, Hong-Mei Sun, Cheng-Yue Che, Shuo Feng, Rui-Sheng Jia\",\"doi\":\"10.1016/j.engappai.2025.111581\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In real-world scenarios, context-aware emotion recognition (CAER) is a key problem in affective computing with broad application prospects. Most current CAER methods primarily rely on image-level contextual features. However, the interactive relationships between the agent and other objects within the scene are often overlooked or only partially modeled, which limits emotion recognition accuracy. To address this, we proposed a spatial interactive context-aware emotion network (ICENet) that consists of an agent feature extraction branch and a scene-context interaction branch. Specifically, the agent feature extraction branch aims to extract facial and posture features from the target agent and fuse them. In the facial feature extraction network named FaceNet, pure Convolutional Neural Network (ConvNeXt) is used as the backbone to extract global features, and a self-attention-based fine-grained feature extraction (FGFE) module is designed to capture more discriminative local features. In the posture feature extraction network, semantic segmentation is used to extract human silhouettes, which are then processed by Vision Transformer to obtain posture-related features. Meanwhile, the scene-context interaction branch named ObjNet integrates agent’s gaze angle and global depth maps to construct target agent-objects relationship in three-dimensional (TAR3D). Subsequently, a Graph Convolutional Network is employed to model the TAR3D and extract scene-context interaction features. Subsequently, a multiplicative fusion strategy is adopted to integrate agent features with scene-context interaction features, and emotion classification is performed based on the fused representation. Finally, experiments on EMOTIC and CAER-S datasets show that our approach outperforms current state-of-the-art methods in classification accuracy. The code is available at <span><span>https://github.com/Cyx336/ICENet.git</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"158 \",\"pages\":\"Article 111581\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625015830\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625015830","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Context-aware emotion recognition through agent-scene interactions
In real-world scenarios, context-aware emotion recognition (CAER) is a key problem in affective computing with broad application prospects. Most current CAER methods primarily rely on image-level contextual features. However, the interactive relationships between the agent and other objects within the scene are often overlooked or only partially modeled, which limits emotion recognition accuracy. To address this, we proposed a spatial interactive context-aware emotion network (ICENet) that consists of an agent feature extraction branch and a scene-context interaction branch. Specifically, the agent feature extraction branch aims to extract facial and posture features from the target agent and fuse them. In the facial feature extraction network named FaceNet, pure Convolutional Neural Network (ConvNeXt) is used as the backbone to extract global features, and a self-attention-based fine-grained feature extraction (FGFE) module is designed to capture more discriminative local features. In the posture feature extraction network, semantic segmentation is used to extract human silhouettes, which are then processed by Vision Transformer to obtain posture-related features. Meanwhile, the scene-context interaction branch named ObjNet integrates agent’s gaze angle and global depth maps to construct target agent-objects relationship in three-dimensional (TAR3D). Subsequently, a Graph Convolutional Network is employed to model the TAR3D and extract scene-context interaction features. Subsequently, a multiplicative fusion strategy is adopted to integrate agent features with scene-context interaction features, and emotion classification is performed based on the fused representation. Finally, experiments on EMOTIC and CAER-S datasets show that our approach outperforms current state-of-the-art methods in classification accuracy. The code is available at https://github.com/Cyx336/ICENet.git.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.