Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion

IF 4.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Siqi Guo, Mian Wu, Chunhui Zhang, Ling Zhong
{"title":"Emotion recognition in panoramic audio and video virtual reality based on deep learning and feature fusion","authors":"Siqi Guo,&nbsp;Mian Wu,&nbsp;Chunhui Zhang,&nbsp;Ling Zhong","doi":"10.1016/j.eij.2025.100697","DOIUrl":null,"url":null,"abstract":"<div><div>Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"30 ","pages":"Article 100697"},"PeriodicalIF":4.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866525000908","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Virtual reality technology has been widely applied in various fields of society, and its content emotion recognition has received much attention. The recognition of emotions in virtual reality content can be employed to regulate emotional states in accordance with the emotional content, to treat mental illness and to assess psychological cognition. Nevertheless, the current research on emotion induction and recognition of virtual reality scenes lacks scientific and quantitative methods for establishing the mapping relationship between virtual reality scenes and emotion labels. Furthermore, the associated methods lack clarity regarding image feature extraction, which contributes to the diminished accuracy of emotion recognition in virtual reality content. To solve the current issue of inaccurate emotion recognition in virtual reality content, this study combines convolutional neural networks and long short-term memory. The attention mechanism and multi-modal feature fusion are introduced to improve the speed of feature extraction and convergence. Finally, an improved algorithm-based emotion recognition model for panoramic audio and video virtual reality is proposed. The average accuracy of the proposed algorithm, XLNet-BIGRU-Attention algorithm, and CNN-BiLSTM algorithm was 98.87%, 90.25%, and 86.21%, respectively. The average precision was 98.97%, 97.24% and 97.69%, respectively. The proposed algorithm was significantly superior to the comparison algorithm. A performance comparison was conducted between panoramic audio and video virtual reality emotion recognition models based on the improved algorithm. The improved algorithm’s the mean square error is 0.17 and mean absolute error is 0.19, obviously better than other comparison models. In the analysis of visual classification results, the proposed model has the best classification aggregation effect and is significantly superior to other models. Therefore, the improved algorithm and the panoramic audio and video virtual reality emotion recognition model based on the improved algorithm have good effectiveness and practical value.
基于深度学习和特征融合的全景音视频虚拟现实情感识别
虚拟现实技术已广泛应用于社会的各个领域,其中内容情感识别备受关注。通过对虚拟现实内容中情绪的识别,可以根据情绪内容调节情绪状态,治疗精神疾病,评估心理认知。然而,目前对虚拟现实场景情感诱导与识别的研究缺乏科学、定量的方法来建立虚拟现实场景与情感标签之间的映射关系。此外,相关的方法在图像特征提取方面缺乏清晰度,这导致虚拟现实内容中情感识别的准确性降低。为了解决当前虚拟现实内容中情绪识别不准确的问题,本研究将卷积神经网络与长短期记忆相结合。引入了注意机制和多模态特征融合,提高了特征提取和收敛速度。最后,提出了一种改进的基于算法的全景音视频虚拟现实情感识别模型。本文算法与XLNet-BIGRU-Attention算法和CNN-BiLSTM算法的平均准确率分别为98.87%、90.25%和86.21%。平均精密度分别为98.97%、97.24%和97.69%。该算法明显优于比较算法。对基于改进算法的全景音频和视频虚拟现实情感识别模型进行了性能比较。改进算法的均方误差为0.17,平均绝对误差为0.19,明显优于其他比较模型。在视觉分类结果分析中,该模型的分类聚合效果最好,明显优于其他模型。因此,改进算法和基于改进算法的全景音视频虚拟现实情感识别模型具有良好的有效性和实用价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Egyptian Informatics Journal
Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research
CiteScore
11.10
自引率
1.90%
发文量
59
审稿时长
110 days
期刊介绍: The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信