Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks

Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge Pub Date : 2015-10-26 DOI:10.1145/2808196.2811638

Shizhe Chen, Qin Jin

引用次数: 92

Abstract

Emotion recognition has been an active research area with both wide applications and big challenges. This paper presents our effort for the Audio/Visual Emotion Challenge (AVEC2015), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). Our system applies the Recurrent Neural Networks (RNN) to model temporal information. We explore various aspects to improve the prediction performance including: the dominant modalities for arousal and valence prediction, duration of features, novel loss functions, directions of Long Short Term Memory (LSTM), multi-task learning, different structures for early feature fusion and late fusion. Best settings are chosen according to the performance on the development set. Competitive experimental results compared with the baseline show the effectiveness of the proposed methods.

查看原文本刊更多论文

基于递归神经网络的多模态多维情绪识别

情绪识别是一个活跃的研究领域，具有广泛的应用前景和巨大的挑战。本文介绍了我们为音频/视觉情感挑战(AVEC2015)所做的努力，其目标是探索利用音频，视觉和生理信号来连续预测情感维度(唤醒和效价)的值。我们的系统应用递归神经网络(RNN)来建模时间信息。我们从多个方面探讨了提高预测性能的方法，包括:唤醒和效价预测的主导模式、特征持续时间、新的损失函数、长短期记忆(LSTM)的方向、多任务学习、早期和晚期特征融合的不同结构。根据开发集的性能选择最佳设置。与基线比较的竞争性实验结果表明了所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge

自引率

0.00%

发文量