Optimizing Emotional Insight through Unimodal and Multimodal Long Short-term Memory Models

IF 16.4 1区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Accounts of Chemical Research Pub Date : 2024-06-09 DOI:10.14500/aro.11477

Hemin Ibrahim, C. K. Loo, Shreeyash Y. Geda, Abdulbasit K. Al-Talabani

{"title":"Optimizing Emotional Insight through Unimodal and Multimodal Long Short-term Memory Models","authors":"Hemin Ibrahim, C. K. Loo, Shreeyash Y. Geda, Abdulbasit K. Al-Talabani","doi":"10.14500/aro.11477","DOIUrl":null,"url":null,"abstract":"The field of multimodal emotion recognition is increasingly gaining popularity as a research area. It involves analyzing human emotions across multiple modalities, such as acoustic, visual, and language. Emotion recognition is more effective as a multimodal learning task than relying on a single modality. In this paper, we present an unimodal and multimodal long short-term memory model with a class weight parameter technique for emotion recognition on the CMU-Multimodal Opinion Sentiment and Emotion Intensity dataset. In addition, a critical challenge lies in selecting the most effective fusion method for integrating multiple modalities. To address this, we applied four different fusion techniques: Early fusion, late fusion, deep fusion, and tensor fusion. These fusion methods improved the performance of multimodal emotion recognition compared to unimodal approaches. With the highly imbalanced number of samples per emotion class in the MOSEI dataset, adding a class weight parameter technique leads our model to outperform the state of the art on all three modalities — acoustic, visual, and language — as well as on all the fusion models. The challenges of class imbalance, which can lead to biased model performance, and using an effective fusion method for integrating multiple modalities often result in decreased accuracy in recognizing less frequent emotion classes. Our proposed model shows 2–3% performance improvement in the unimodal and 2% in the multimodal over the state-of-the-art achieved results.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":" 32","pages":""},"PeriodicalIF":16.4000,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14500/aro.11477","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The field of multimodal emotion recognition is increasingly gaining popularity as a research area. It involves analyzing human emotions across multiple modalities, such as acoustic, visual, and language. Emotion recognition is more effective as a multimodal learning task than relying on a single modality. In this paper, we present an unimodal and multimodal long short-term memory model with a class weight parameter technique for emotion recognition on the CMU-Multimodal Opinion Sentiment and Emotion Intensity dataset. In addition, a critical challenge lies in selecting the most effective fusion method for integrating multiple modalities. To address this, we applied four different fusion techniques: Early fusion, late fusion, deep fusion, and tensor fusion. These fusion methods improved the performance of multimodal emotion recognition compared to unimodal approaches. With the highly imbalanced number of samples per emotion class in the MOSEI dataset, adding a class weight parameter technique leads our model to outperform the state of the art on all three modalities — acoustic, visual, and language — as well as on all the fusion models. The challenges of class imbalance, which can lead to biased model performance, and using an effective fusion method for integrating multiple modalities often result in decreased accuracy in recognizing less frequent emotion classes. Our proposed model shows 2–3% performance improvement in the unimodal and 2% in the multimodal over the state-of-the-art achieved results.

查看原文本刊更多论文

通过单模态和多模态长短期记忆模型优化情感洞察力

多模态情感识别作为一个研究领域正日益受到人们的关注。它涉及通过声学、视觉和语言等多种模式分析人类情绪。作为一项多模态学习任务，情感识别比依赖单一模态更有效。在本文中，我们提出了一种单模态和多模态长短期记忆模型，该模型采用类权重参数技术，用于在 CMU-Multimodal Opinion Sentiment and Emotion Intensity 数据集上进行情感识别。此外，一个关键的挑战在于选择最有效的融合方法来整合多种模态。为此，我们采用了四种不同的融合技术：早期融合、后期融合、深度融合和张量融合。与单模态方法相比，这些融合方法提高了多模态情感识别的性能。由于 MOSEI 数据集中每个情感类别的样本数量高度不平衡，添加类别权重参数技术使我们的模型在声学、视觉和语言这三种模态以及所有融合模型上的表现都优于目前的技术水平。类别不平衡会导致模型性能出现偏差，而使用有效的融合方法来整合多种模态往往会降低识别频率较低的情感类别的准确性。与最先进的成果相比，我们提出的模型在单模态方面提高了 2-3%，在多模态方面提高了 2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Accounts of Chemical Research 化学-化学综合

CiteScore

31.40

自引率

1.10%

发文量

312

审稿时长

2 months

期刊介绍： Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.