人类情感识别的多模型深度学习方法。

IF 3.9 3区工程技术 Q2 NEUROSCIENCES

Cognitive Neurodynamics Pub Date : 2025-12-01 Epub Date: 2025-08-02 DOI:10.1007/s11571-025-10304-3

Lalitha Arumugam, Samydurai Arumugam, Pabitha Chidambaram, Kumaresan Govindasamy

{"title":"人类情感识别的多模型深度学习方法。","authors":"Lalitha Arumugam, Samydurai Arumugam, Pabitha Chidambaram, Kumaresan Govindasamy","doi":"10.1007/s11571-025-10304-3","DOIUrl":null,"url":null,"abstract":"Emotion recognition is a difficult problem mainly because emotions are presented in different modalities including; speech, face, and text. In light of this, in this paper, we introduce a novel framework known as Audio, Visual, and Text Emotions Fusion Network that will enhance the approaches to analyzing emotions that can incorporate these dissimilar types of inputs efficiently for the enhancement of the existing approaches to analyzing emotions. Using specialized techniques, each modality in this framework shows Graph Attention Network-based Transformer Network by employing Graph Attention Networks to detect dependencies in facial regions; Hybrid Wav2Vec 2.0 and Convolutional Neural Network combines Wav2Vec 2.0, and Convolutional Neural Network to extract informative temporal and frequency domain audio features. Contextual and sequential text semantics are captured by Bidirectional Encoder Representations from Transformers with Bidirectional Gated Recurrent Unit. They are fused based on a novel attention-based mechanism that distributes weights depending on the emotional context and improves cross-modal interactions. Moreover, the Audio, Visual, and Text Emotions Fusion Network system effectively identifies emotions, and the result section that contains overall accuracy at 98.7%, precision at 98.2%, recall, at 97.2%, and F1-score of 97.49% makes the proposed approach strong and efficient for real-time emotion recognition strategies.","PeriodicalId":10500,"journal":{"name":"Cognitive Neurodynamics","volume":"19 1","pages":"123"},"PeriodicalIF":3.9000,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12317966/pdf/","citationCount":"0","resultStr":"{\"title\":\"A multi-model deep learning approach for human emotion recognition.\",\"authors\":\"Lalitha Arumugam, Samydurai Arumugam, Pabitha Chidambaram, Kumaresan Govindasamy\",\"doi\":\"10.1007/s11571-025-10304-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emotion recognition is a difficult problem mainly because emotions are presented in different modalities including; speech, face, and text. In light of this, in this paper, we introduce a novel framework known as Audio, Visual, and Text Emotions Fusion Network that will enhance the approaches to analyzing emotions that can incorporate these dissimilar types of inputs efficiently for the enhancement of the existing approaches to analyzing emotions. Using specialized techniques, each modality in this framework shows Graph Attention Network-based Transformer Network by employing Graph Attention Networks to detect dependencies in facial regions; Hybrid Wav2Vec 2.0 and Convolutional Neural Network combines Wav2Vec 2.0, and Convolutional Neural Network to extract informative temporal and frequency domain audio features. Contextual and sequential text semantics are captured by Bidirectional Encoder Representations from Transformers with Bidirectional Gated Recurrent Unit. They are fused based on a novel attention-based mechanism that distributes weights depending on the emotional context and improves cross-modal interactions. Moreover, the Audio, Visual, and Text Emotions Fusion Network system effectively identifies emotions, and the result section that contains overall accuracy at 98.7%, precision at 98.2%, recall, at 97.2%, and F1-score of 97.49% makes the proposed approach strong and efficient for real-time emotion recognition strategies.\",\"PeriodicalId\":10500,\"journal\":{\"name\":\"Cognitive Neurodynamics\",\"volume\":\"19 1\",\"pages\":\"123\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12317966/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Neurodynamics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11571-025-10304-3\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Neurodynamics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11571-025-10304-3","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

情绪识别是一个困难的问题，主要是因为情绪以不同的方式呈现，包括；语言、表情和文本。鉴于此，在本文中，我们引入了一个称为音频，视觉和文本情感融合网络的新框架，该框架将增强分析情感的方法，该方法可以有效地结合这些不同类型的输入，以增强现有的情感分析方法。使用专门的技术，该框架中的每个模态通过使用图注意网络来检测面部区域的依赖关系来显示基于图注意网络的变压器网络；混合Wav2Vec 2.0和卷积神经网络结合了Wav2Vec 2.0和卷积神经网络来提取信息丰富的时域和频域音频特征。上下文和顺序文本语义通过双向编码表示捕获，这些表示来自具有双向门控循环单元的变压器。它们是基于一种新的基于注意的机制融合的，该机制根据情绪情境分配权重，并改善了跨模态交互。此外，音频、视觉和文本情感融合网络系统有效地识别了情绪，结果部分包含98.7%的总体准确率、98.2%的精度、97.2%的召回率和97.49%的f1分数，这使得该方法对于实时情绪识别策略来说是强大而高效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A multi-model deep learning approach for human emotion recognition.

Emotion recognition is a difficult problem mainly because emotions are presented in different modalities including; speech, face, and text. In light of this, in this paper, we introduce a novel framework known as Audio, Visual, and Text Emotions Fusion Network that will enhance the approaches to analyzing emotions that can incorporate these dissimilar types of inputs efficiently for the enhancement of the existing approaches to analyzing emotions. Using specialized techniques, each modality in this framework shows Graph Attention Network-based Transformer Network by employing Graph Attention Networks to detect dependencies in facial regions; Hybrid Wav2Vec 2.0 and Convolutional Neural Network combines Wav2Vec 2.0, and Convolutional Neural Network to extract informative temporal and frequency domain audio features. Contextual and sequential text semantics are captured by Bidirectional Encoder Representations from Transformers with Bidirectional Gated Recurrent Unit. They are fused based on a novel attention-based mechanism that distributes weights depending on the emotional context and improves cross-modal interactions. Moreover, the Audio, Visual, and Text Emotions Fusion Network system effectively identifies emotions, and the result section that contains overall accuracy at 98.7%, precision at 98.2%, recall, at 97.2%, and F1-score of 97.49% makes the proposed approach strong and efficient for real-time emotion recognition strategies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognitive Neurodynamics 医学-神经科学

CiteScore

6.90

自引率

18.90%

发文量

140

审稿时长

12 months

期刊介绍： Cognitive Neurodynamics provides a unique forum of communication and cooperation for scientists and engineers working in the field of cognitive neurodynamics, intelligent science and applications, bridging the gap between theory and application, without any preference for pure theoretical, experimental or computational models. The emphasis is to publish original models of cognitive neurodynamics, novel computational theories and experimental results. In particular, intelligent science inspired by cognitive neuroscience and neurodynamics is also very welcome. The scope of Cognitive Neurodynamics covers cognitive neuroscience, neural computation based on dynamics, computer science, intelligent science as well as their interdisciplinary applications in the natural and engineering sciences. Papers that are appropriate for non-specialist readers are encouraged. 1. There is no page limit for manuscripts submitted to Cognitive Neurodynamics. Research papers should clearly represent an important advance of especially broad interest to researchers and technologists in neuroscience, biophysics, BCI, neural computer and intelligent robotics. 2. Cognitive Neurodynamics also welcomes brief communications: short papers reporting results that are of genuinely broad interest but that for one reason and another do not make a sufficiently complete story to justify a full article publication. Brief Communications should consist of approximately four manuscript pages. 3. Cognitive Neurodynamics publishes review articles in which a specific field is reviewed through an exhaustive literature survey. There are no restrictions on the number of pages. Review articles are usually invited, but submitted reviews will also be considered.