Optimizing Emotional Insight through Unimodal and Multimodal Long Short-term Memory Models

IF 1.2 Q3 MULTIDISCIPLINARY SCIENCES
Hemin Ibrahim, C. K. Loo, Shreeyash Y. Geda, Abdulbasit K. Al-Talabani
{"title":"Optimizing Emotional Insight through Unimodal and Multimodal Long Short-term Memory Models","authors":"Hemin Ibrahim, C. K. Loo, Shreeyash Y. Geda, Abdulbasit K. Al-Talabani","doi":"10.14500/aro.11477","DOIUrl":null,"url":null,"abstract":"The field of multimodal emotion recognition is increasingly gaining popularity as a research area. It involves analyzing human emotions across multiple modalities, such as acoustic, visual, and language. Emotion recognition is more effective as a multimodal learning task than relying on a single modality. In this paper, we present an unimodal and multimodal long short-term memory model with a class weight parameter technique for emotion recognition on the CMU-Multimodal Opinion Sentiment and Emotion Intensity dataset. In addition, a critical challenge lies in selecting the most effective fusion method for integrating multiple modalities. To address this, we applied four different fusion techniques: Early fusion, late fusion, deep fusion, and tensor fusion. These fusion methods improved the performance of multimodal emotion recognition compared to unimodal approaches. With the highly imbalanced number of samples per emotion class in the MOSEI dataset, adding a class weight parameter technique leads our model to outperform the state of the art on all three modalities — acoustic, visual, and language — as well as on all the fusion models. The challenges of class imbalance, which can lead to biased model performance, and using an effective fusion method for integrating multiple modalities often result in decreased accuracy in recognizing less frequent emotion classes. Our proposed model shows 2–3% performance improvement in the unimodal and 2% in the multimodal over the state-of-the-art achieved results.","PeriodicalId":8398,"journal":{"name":"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14500/aro.11477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The field of multimodal emotion recognition is increasingly gaining popularity as a research area. It involves analyzing human emotions across multiple modalities, such as acoustic, visual, and language. Emotion recognition is more effective as a multimodal learning task than relying on a single modality. In this paper, we present an unimodal and multimodal long short-term memory model with a class weight parameter technique for emotion recognition on the CMU-Multimodal Opinion Sentiment and Emotion Intensity dataset. In addition, a critical challenge lies in selecting the most effective fusion method for integrating multiple modalities. To address this, we applied four different fusion techniques: Early fusion, late fusion, deep fusion, and tensor fusion. These fusion methods improved the performance of multimodal emotion recognition compared to unimodal approaches. With the highly imbalanced number of samples per emotion class in the MOSEI dataset, adding a class weight parameter technique leads our model to outperform the state of the art on all three modalities — acoustic, visual, and language — as well as on all the fusion models. The challenges of class imbalance, which can lead to biased model performance, and using an effective fusion method for integrating multiple modalities often result in decreased accuracy in recognizing less frequent emotion classes. Our proposed model shows 2–3% performance improvement in the unimodal and 2% in the multimodal over the state-of-the-art achieved results.
通过单模态和多模态长短期记忆模型优化情感洞察力
多模态情感识别作为一个研究领域正日益受到人们的关注。它涉及通过声学、视觉和语言等多种模式分析人类情绪。作为一项多模态学习任务,情感识别比依赖单一模态更有效。在本文中,我们提出了一种单模态和多模态长短期记忆模型,该模型采用类权重参数技术,用于在 CMU-Multimodal Opinion Sentiment and Emotion Intensity 数据集上进行情感识别。此外,一个关键的挑战在于选择最有效的融合方法来整合多种模态。为此,我们采用了四种不同的融合技术:早期融合、后期融合、深度融合和张量融合。与单模态方法相比,这些融合方法提高了多模态情感识别的性能。由于 MOSEI 数据集中每个情感类别的样本数量高度不平衡,添加类别权重参数技术使我们的模型在声学、视觉和语言这三种模态以及所有融合模型上的表现都优于目前的技术水平。类别不平衡会导致模型性能出现偏差,而使用有效的融合方法来整合多种模态往往会降低识别频率较低的情感类别的准确性。与最先进的成果相比,我们提出的模型在单模态方面提高了 2-3%,在多模态方面提高了 2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY
ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY MULTIDISCIPLINARY SCIENCES-
自引率
33.30%
发文量
33
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信