Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen
{"title":"Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition","authors":"Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen","doi":"10.1145/2808196.2811634","DOIUrl":null,"url":null,"abstract":"This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+EC2015), whose goal is to predict the continuous values of the emotion dimensions arousal and valence from audio, visual and physiology modalities. The state of art classifier for dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. Except regular LSTM-RNN prediction architecture, two techniques are investigated for dimensional emotion recognition problem. The first one is ε -insensitive loss is utilized as the loss function to optimize. Compared to squared loss function, which is the most widely used loss function for dimension emotion recognition, ε -insensitive loss is more robust for the label noises and it can ignore small errors to get stronger correlation between predictions and labels. The other one is temporal pooling. This technique enables temporal modeling in the input features and increases the diversity of the features fed into the forward prediction architecture. Experiments results show the efficiency of key points of the proposed method and competitive results are obtained.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"116","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808196.2811634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 116
Abstract
This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+EC2015), whose goal is to predict the continuous values of the emotion dimensions arousal and valence from audio, visual and physiology modalities. The state of art classifier for dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. Except regular LSTM-RNN prediction architecture, two techniques are investigated for dimensional emotion recognition problem. The first one is ε -insensitive loss is utilized as the loss function to optimize. Compared to squared loss function, which is the most widely used loss function for dimension emotion recognition, ε -insensitive loss is more robust for the label noises and it can ignore small errors to get stronger correlation between predictions and labels. The other one is temporal pooling. This technique enables temporal modeling in the input features and increases the diversity of the features fed into the forward prediction architecture. Experiments results show the efficiency of key points of the proposed method and competitive results are obtained.