Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model

Licai Sun, Mingyu Xu, Zheng Lian, B. Liu, J. Tao, Meng Wang, Yuan Cheng
{"title":"Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model","authors":"Licai Sun, Mingyu Xu, Zheng Lian, B. Liu, J. Tao, Meng Wang, Yuan Cheng","doi":"10.1145/3475957.3484456","DOIUrl":null,"url":null,"abstract":"With the proliferation of user-generated videos in online websites, it becomes particularly important to achieve automatic perception and understanding of human emotion/sentiment from these videos. In this paper, we present our solutions to the MuSe-Wilder and MuSe-Sent sub-challenges in MuSe 2021 Multimodal Sentiment Analysis Challenge. MuSe-Wilder focuses on continuous emotion (i.e., arousal and valence) recognition while the task of MuSe-Sent concentrates on discrete sentiment classification. To this end, we first extract a variety of features from three common modalities (i.e., audio, visual, and text), including both low-level handcrafted features and high-level deep representations from supervised/unsupervised pre-trained models. Then, the long short-term memory recurrent neural network, as well as the self-attention mechanism is employed to model the complex temporal dependencies in the feature sequence. The concordance correlation coefficient (CCC) loss and F1-loss are used to guide continuous regression and discrete classification, respectively. To further boost the model's performance, we adopt late fusion to exploit complementary information from different modalities. Our proposed method achieves CCCs of 0.4117 and 0.6649 for arousal and valence respectively on the test set of MuSe-Wilder, which outperforms the baseline system (i.e., 0.3386 and 0.5974) by a large margin. For MuSe-Sent, F1-scores of 0.3614 and 0.4451 for arousal and valence are obtained, which also outperforms the baseline system significantly (i.e., 0.3512 and 0.3291). With these promising results, we ranked top3 in both sub-challenges.","PeriodicalId":313996,"journal":{"name":"Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3475957.3484456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

With the proliferation of user-generated videos in online websites, it becomes particularly important to achieve automatic perception and understanding of human emotion/sentiment from these videos. In this paper, we present our solutions to the MuSe-Wilder and MuSe-Sent sub-challenges in MuSe 2021 Multimodal Sentiment Analysis Challenge. MuSe-Wilder focuses on continuous emotion (i.e., arousal and valence) recognition while the task of MuSe-Sent concentrates on discrete sentiment classification. To this end, we first extract a variety of features from three common modalities (i.e., audio, visual, and text), including both low-level handcrafted features and high-level deep representations from supervised/unsupervised pre-trained models. Then, the long short-term memory recurrent neural network, as well as the self-attention mechanism is employed to model the complex temporal dependencies in the feature sequence. The concordance correlation coefficient (CCC) loss and F1-loss are used to guide continuous regression and discrete classification, respectively. To further boost the model's performance, we adopt late fusion to exploit complementary information from different modalities. Our proposed method achieves CCCs of 0.4117 and 0.6649 for arousal and valence respectively on the test set of MuSe-Wilder, which outperforms the baseline system (i.e., 0.3386 and 0.5974) by a large margin. For MuSe-Sent, F1-scores of 0.3614 and 0.4451 for arousal and valence are obtained, which also outperforms the baseline system significantly (i.e., 0.3512 and 0.3291). With these promising results, we ranked top3 in both sub-challenges.
基于注意增强循环模型的多模态情绪识别与情绪分析
随着在线网站中用户生成视频的激增,从这些视频中实现对人类情感/情绪的自动感知和理解变得尤为重要。在本文中,我们提出了MuSe 2021多模态情感分析挑战赛中MuSe- wilder和MuSe- sent子挑战的解决方案。MuSe-Wilder侧重于连续情绪(即唤醒和效价)识别,而MuSe-Sent侧重于离散情绪分类。为此,我们首先从三种常见模式(即音频、视觉和文本)中提取各种特征,包括低级手工制作的特征和来自监督/无监督预训练模型的高级深度表示。然后,利用长短期记忆递归神经网络和自注意机制对特征序列中复杂的时间依赖性进行建模。一致性相关系数(CCC)损失和f1损失分别用于指导连续回归和离散分类。为了进一步提高模型的性能,我们采用后期融合来利用不同模态的互补信息。我们提出的方法在MuSe-Wilder测试集上的唤醒和效价CCCs分别为0.4117和0.6649,大大优于基线系统(即0.3386和0.5974)。MuSe-Sent的唤醒和效价f1得分分别为0.3614和0.4451,也明显优于基线系统(即0.3512和0.3291)。有了这些有希望的结果,我们在两个子挑战中都排名前三。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信