Jian Huang, Ya Li, J. Tao, Zheng Lian, Mingyue Niu, Minghao Yang
{"title":"Multimodal Continuous Emotion Recognition with Data Augmentation Using Recurrent Neural Networks","authors":"Jian Huang, Ya Li, J. Tao, Zheng Lian, Mingyue Niu, Minghao Yang","doi":"10.1145/3266302.3266304","DOIUrl":null,"url":null,"abstract":"This paper presents our effects for Cross-cultural Emotion Sub-challenge in the Audio/Visual Emotion Challenge (AVEC) 2018, whose goal is to predict the level of three emotional dimensions time-continuously in a cross-cultural setup. We extract the emotional features from audio, visual and textual modalities. The state of art regressor for continuous emotion recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. We augment the training data by replacing the original training samples with shorter overlapping samples extracted from them, thus multiplying the number of training samples and also beneficial to train emotional temporal model with LSTM-RNN. In addition, two strategies are explored to decrease the interlocutor influence to improve the performance. We also compare the performance of feature level fusion and decision level fusion. The experimental results show the efficiency of the proposed method and competitive results are obtained.","PeriodicalId":123523,"journal":{"name":"Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3266302.3266304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22
Abstract
This paper presents our effects for Cross-cultural Emotion Sub-challenge in the Audio/Visual Emotion Challenge (AVEC) 2018, whose goal is to predict the level of three emotional dimensions time-continuously in a cross-cultural setup. We extract the emotional features from audio, visual and textual modalities. The state of art regressor for continuous emotion recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. We augment the training data by replacing the original training samples with shorter overlapping samples extracted from them, thus multiplying the number of training samples and also beneficial to train emotional temporal model with LSTM-RNN. In addition, two strategies are explored to decrease the interlocutor influence to improve the performance. We also compare the performance of feature level fusion and decision level fusion. The experimental results show the efficiency of the proposed method and competitive results are obtained.