Youssef Mohamed, Severin Lemaignan, Arzu Guneysu, Patric Jensfelt, Christian Smith
{"title":"Fusion in Context: A Multimodal Approach to Affective State Recognition","authors":"Youssef Mohamed, Severin Lemaignan, Arzu Guneysu, Patric Jensfelt, Christian Smith","doi":"arxiv-2409.11906","DOIUrl":null,"url":null,"abstract":"Accurate recognition of human emotions is a crucial challenge in affective\ncomputing and human-robot interaction (HRI). Emotional states play a vital role\nin shaping behaviors, decisions, and social interactions. However, emotional\nexpressions can be influenced by contextual factors, leading to\nmisinterpretations if context is not considered. Multimodal fusion, combining\nmodalities like facial expressions, speech, and physiological signals, has\nshown promise in improving affect recognition. This paper proposes a\ntransformer-based multimodal fusion approach that leverages facial thermal\ndata, facial action units, and textual context information for context-aware\nemotion recognition. We explore modality-specific encoders to learn tailored\nrepresentations, which are then fused using additive fusion and processed by a\nshared transformer encoder to capture temporal dependencies and interactions.\nThe proposed method is evaluated on a dataset collected from participants\nengaged in a tangible tabletop Pacman game designed to induce various affective\nstates. Our results demonstrate the effectiveness of incorporating contextual\ninformation and multimodal fusion for affective state recognition.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate recognition of human emotions is a crucial challenge in affective
computing and human-robot interaction (HRI). Emotional states play a vital role
in shaping behaviors, decisions, and social interactions. However, emotional
expressions can be influenced by contextual factors, leading to
misinterpretations if context is not considered. Multimodal fusion, combining
modalities like facial expressions, speech, and physiological signals, has
shown promise in improving affect recognition. This paper proposes a
transformer-based multimodal fusion approach that leverages facial thermal
data, facial action units, and textual context information for context-aware
emotion recognition. We explore modality-specific encoders to learn tailored
representations, which are then fused using additive fusion and processed by a
shared transformer encoder to capture temporal dependencies and interactions.
The proposed method is evaluated on a dataset collected from participants
engaged in a tangible tabletop Pacman game designed to induce various affective
states. Our results demonstrate the effectiveness of incorporating contextual
information and multimodal fusion for affective state recognition.