Multimodal Emotion Fusion Mechanism and Empathetic Responses in Companion Robots

IF 4.9 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Cognitive and Developmental Systems Pub Date : 2024-08-13 DOI:10.1109/TCDS.2024.3442203

Xiaofeng Liu;Qincheng Lv;Jie Li;Siyang Song;Angelo Cangelosi

{"title":"Multimodal Emotion Fusion Mechanism and Empathetic Responses in Companion Robots","authors":"Xiaofeng Liu;Qincheng Lv;Jie Li;Siyang Song;Angelo Cangelosi","doi":"10.1109/TCDS.2024.3442203","DOIUrl":null,"url":null,"abstract":"The ability of humanoid robots to exhibit empathetic facial expressions and provide corresponding responses is essential for natural human–robot interaction. To enhance this, we integrate the GPT3.5 model with a facial expression recognition model, creating a multimodal emotion recognition system. Additionally, we address the challenge of realistically mimicking human facial expressions by designing the physical structure of a humanoid robot. Initially, we develop a humanoid robot capable of adjusting the positions of its facial organs and neck through servo displacement to achieve more natural facial expressions. Subsequently, to overcome the current limitation where emotional interaction robots struggle to accurately recognize user emotions, we introduce a coupled generative pretrained transformer (GPT)-based multimodal emotion recognition method that utilizes both text and images, thereby enhancing the robot's emotion recognition accuracy. Finally, we integrate the GPT-3.5 model to generate empathetic responses based on recognized user emotional states and language text, which are then mapped onto the robot to enable empathetic expressions that can achieve a more comfortable human–machine interaction experience. Experimental results on benchmark databases demonstrate that the performance of the coupled GPT-based multimodal emotion recognition method using text and images outperforms other approaches, and it possesses unique empathetic response capabilities relative to alternative methods.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 2","pages":"271-286"},"PeriodicalIF":4.9000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10634513/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The ability of humanoid robots to exhibit empathetic facial expressions and provide corresponding responses is essential for natural human–robot interaction. To enhance this, we integrate the GPT3.5 model with a facial expression recognition model, creating a multimodal emotion recognition system. Additionally, we address the challenge of realistically mimicking human facial expressions by designing the physical structure of a humanoid robot. Initially, we develop a humanoid robot capable of adjusting the positions of its facial organs and neck through servo displacement to achieve more natural facial expressions. Subsequently, to overcome the current limitation where emotional interaction robots struggle to accurately recognize user emotions, we introduce a coupled generative pretrained transformer (GPT)-based multimodal emotion recognition method that utilizes both text and images, thereby enhancing the robot's emotion recognition accuracy. Finally, we integrate the GPT-3.5 model to generate empathetic responses based on recognized user emotional states and language text, which are then mapped onto the robot to enable empathetic expressions that can achieve a more comfortable human–machine interaction experience. Experimental results on benchmark databases demonstrate that the performance of the coupled GPT-based multimodal emotion recognition method using text and images outperforms other approaches, and it possesses unique empathetic response capabilities relative to alternative methods.

查看原文本刊更多论文

多模态情感融合机制与陪伴机器人的情感反应

类人机器人表现出移情面部表情并提供相应反应的能力对于自然的人机交互至关重要。为了增强这一点，我们将GPT3.5模型与面部表情识别模型相结合，创建了一个多模态情绪识别系统。此外，我们通过设计人形机器人的物理结构来解决逼真地模仿人类面部表情的挑战。首先，我们开发了一种能够通过伺服位移调整其面部器官和颈部位置的类人机器人，以实现更自然的面部表情。随后，为了克服当前情感交互机器人难以准确识别用户情感的局限性，我们引入了一种基于耦合生成预训练转换器（GPT）的多模态情感识别方法，该方法同时利用文本和图像，从而提高了机器人的情感识别精度。最后，我们整合GPT-3.5模型，根据识别的用户情绪状态和语言文本生成共情反应，然后将其映射到机器人上，使共情表达能够实现更舒适的人机交互体验。在基准数据库上的实验结果表明，基于gpt的文本和图像耦合多模态情感识别方法的性能优于其他方法，并且相对于其他方法具有独特的移情响应能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cognitive and Developmental Systems Computer Science-Software

CiteScore

7.20

自引率

10.00%

发文量

170

期刊介绍： The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.