基于长短期记忆递归神经网络的视频情感识别编码方法

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2016-03-20 DOI:10.1109/ICASSP.2016.7472178

Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen

{"title":"基于长短期记忆递归神经网络的视频情感识别编码方法","authors":"Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen","doi":"10.1109/ICASSP.2016.7472178","DOIUrl":null,"url":null,"abstract":"Human emotion is a temporally dynamic event which can be inferred from both audio and video feature sequences. In this paper we investigate the long short term memory recurrent neural network (LSTM-RNN) based encoding method for category emotion recognition in the video. LSTM-RNN is able to incorporate knowledge about how emotion evolves over long range successive frames and emotion clues from isolated frame. After encoding, each video clip can be represented by a vector for each input feature sequence. The vectors contain both frame level and sequence level emotion information. These vectors are then concatenated and fed into support vector machine (SVM) to get the final prediction result. Extensive evaluations on Emotion Challenge in the Wild (EmotiW2015) dataset show the efficiency of the proposed encoding method and competitive results are obtained. The final recognition accuracy achieves 46.38% for audio-video emotion recognition sub-challenge, where the challenge baseline is 39.33%.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Long short term memory recurrent neural network based encoding method for emotion recognition in video\",\"authors\":\"Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen\",\"doi\":\"10.1109/ICASSP.2016.7472178\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human emotion is a temporally dynamic event which can be inferred from both audio and video feature sequences. In this paper we investigate the long short term memory recurrent neural network (LSTM-RNN) based encoding method for category emotion recognition in the video. LSTM-RNN is able to incorporate knowledge about how emotion evolves over long range successive frames and emotion clues from isolated frame. After encoding, each video clip can be represented by a vector for each input feature sequence. The vectors contain both frame level and sequence level emotion information. These vectors are then concatenated and fed into support vector machine (SVM) to get the final prediction result. Extensive evaluations on Emotion Challenge in the Wild (EmotiW2015) dataset show the efficiency of the proposed encoding method and competitive results are obtained. The final recognition accuracy achieves 46.38% for audio-video emotion recognition sub-challenge, where the challenge baseline is 39.33%.\",\"PeriodicalId\":165321,\"journal\":{\"name\":\"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2016.7472178\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2016.7472178","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

人类情感是一种时间动态事件，可以从音频和视频特征序列中推断出来。本文研究了基于长短期记忆递归神经网络(LSTM-RNN)的视频分类情感识别编码方法。LSTM-RNN能够将情感如何在长距离连续帧中演变的知识和来自孤立帧的情感线索结合起来。编码后，每个视频片段可以用每个输入特征序列的向量表示。向量包含帧级和序列级情感信息。然后将这些向量连接并输入支持向量机(SVM)以获得最终的预测结果。对情感挑战(EmotiW2015)数据集的广泛评估表明了所提出的编码方法的有效性，并获得了具有竞争力的结果。音视频情感识别子挑战的最终识别准确率为46.38%，挑战基线为39.33%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Long short term memory recurrent neural network based encoding method for emotion recognition in video

Human emotion is a temporally dynamic event which can be inferred from both audio and video feature sequences. In this paper we investigate the long short term memory recurrent neural network (LSTM-RNN) based encoding method for category emotion recognition in the video. LSTM-RNN is able to incorporate knowledge about how emotion evolves over long range successive frames and emotion clues from isolated frame. After encoding, each video clip can be represented by a vector for each input feature sequence. The vectors contain both frame level and sequence level emotion information. These vectors are then concatenated and fed into support vector machine (SVM) to get the final prediction result. Extensive evaluations on Emotion Challenge in the Wild (EmotiW2015) dataset show the efficiency of the proposed encoding method and competitive results are obtained. The final recognition accuracy achieves 46.38% for audio-video emotion recognition sub-challenge, where the challenge baseline is 39.33%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量