基于视听信息与时间动态融合的多模态情感识别

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications Pub Date : 2024-09-18 DOI:10.1007/s11042-024-20227-6

José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana

{"title":"基于视听信息与时间动态融合的多模态情感识别","authors":"José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana","doi":"10.1007/s11042-024-20227-6","DOIUrl":null,"url":null,"abstract":"<p>In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"32 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics\",\"authors\":\"José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana\",\"doi\":\"10.1007/s11042-024-20227-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.</p>\",\"PeriodicalId\":18770,\"journal\":{\"name\":\"Multimedia Tools and Applications\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Tools and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11042-024-20227-6\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20227-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在人机交互（HMI）领域，了解用户情绪对于提升用户体验至关重要。本文探讨了人机界面中的面部表情识别（FER），采用了一种独特的多模态方法，将视觉和听觉信息整合在一起。认识到人机界面的动态性质，即情况不断变化，本研究强调持续的情感分析。这项工作评估了各种融合策略，包括在主网络中添加不同的架构，如自动编码器（AE）或嵌入模块，以结合多种生物识别线索的信息。除了多模态方法外，本文还引入了一种新的架构，通过结合长短期记忆（LSTM）网络，优先考虑时间动态。最终建议将不同的多模态方法与 LSTM 架构的时间聚焦功能相结合，并在三个公共数据集上进行了测试：RAVDESS、SAVEE 和 CREMA-D。其准确率分别为 88.11%、86.75% 和 80.27%，达到了最先进的水平，优于其他现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics

查看原文本刊更多论文

Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics

In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Multimedia Tools and Applications 工程技术-工程：电子与电气

CiteScore

7.20

自引率

16.70%

发文量

2439

审稿时长

9.2 months

期刊介绍： Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed. Specific areas of interest include: - Multimedia Tools: - Multimedia Applications: - Prototype multimedia systems and platforms