Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion

IF 4.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Egyptian Informatics Journal Pub Date : 2025-06-01 DOI:10.1016/j.eij.2025.100697

Siqi Guo, Mian Wu, Chunhui Zhang, Ling Zhong

{"title":"Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion","authors":"Siqi Guo, Mian Wu, Chunhui Zhang, Ling Zhong","doi":"10.1016/j.eij.2025.100697","DOIUrl":null,"url":null,"abstract":"<div><div>Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"30 ","pages":"Article 100697"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866525000908","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.

查看原文本刊更多论文

基于深度学习和特征融合的全景音视频虚拟现实情感识别

虚拟现实技术已广泛应用于社会的各个领域，其中内容情感识别备受关注。通过对虚拟现实内容中情绪的识别，可以根据情绪内容调节情绪状态，治疗精神疾病，评估心理认知。然而，目前对虚拟现实场景情感诱导与识别的研究缺乏科学、定量的方法来建立虚拟现实场景与情感标签之间的映射关系。此外，相关的方法在图像特征提取方面缺乏清晰度，这导致虚拟现实内容中情感识别的准确性降低。为了解决当前虚拟现实内容中情绪识别不准确的问题，本研究将卷积神经网络与长短期记忆相结合。引入了注意机制和多模态特征融合，提高了特征提取和收敛速度。最后，提出了一种改进的基于算法的全景音视频虚拟现实情感识别模型。本文算法与XLNet-BIGRU-Attention算法和CNN-BiLSTM算法的平均准确率分别为98.87%、90.25%和86.21%。平均精密度分别为98.97%、97.24%和97.69%。该算法明显优于比较算法。对基于改进算法的全景音频和视频虚拟现实情感识别模型进行了性能比较。改进算法的均方误差为0.17，平均绝对误差为0.19，明显优于其他比较模型。在视觉分类结果分析中，该模型的分类聚合效果最好，明显优于其他模型。因此，改进算法和基于改进算法的全景音视频虚拟现实情感识别模型具有良好的有效性和实用价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research

CiteScore

11.10

自引率

1.90%

发文量

审稿时长

110 days

期刊介绍： The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.