野外情绪识别的神经网络

Proceedings of the 16th International Conference on Multimodal Interaction Pub Date : 2014-11-12 DOI:10.1145/2663204.2666270

Michal Grosicki

{"title":"野外情绪识别的神经网络","authors":"Michal Grosicki","doi":"10.1145/2663204.2666270","DOIUrl":null,"url":null,"abstract":"In this paper we present neural networks based method for emotion recognition. Proposed model was developed as part of 2014 Emotion Recognition in the Wild Challenge. It is composed of modality specific neural networks, which where trained separately on audio and video data extracted from short video clips taken from various movies. Each network was trained on frame-level data, which in later stages were aggregated by simple averaging of predicted class distributions for each clip. In the next stage various techniques for combining modalities where investigated with the best being support vector machine with RBF kernel. Our method achieved accuracy of 37.84%, which is better than 33.7% obtained by the best baseline model provided by organisers.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"116 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Neural Networks for Emotion Recognition in the Wild\",\"authors\":\"Michal Grosicki\",\"doi\":\"10.1145/2663204.2666270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present neural networks based method for emotion recognition. Proposed model was developed as part of 2014 Emotion Recognition in the Wild Challenge. It is composed of modality specific neural networks, which where trained separately on audio and video data extracted from short video clips taken from various movies. Each network was trained on frame-level data, which in later stages were aggregated by simple averaging of predicted class distributions for each clip. In the next stage various techniques for combining modalities where investigated with the best being support vector machine with RBF kernel. Our method achieved accuracy of 37.84%, which is better than 33.7% obtained by the best baseline model provided by organisers.\",\"PeriodicalId\":389037,\"journal\":{\"name\":\"Proceedings of the 16th International Conference on Multimodal Interaction\",\"volume\":\"116 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2663204.2666270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2663204.2666270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文提出了一种基于神经网络的情感识别方法。该模型是2014年野生挑战中的情感识别的一部分。它由模态特定的神经网络组成，这些神经网络分别对从各种电影的短视频片段中提取的音频和视频数据进行训练。每个网络都在帧级数据上进行训练，这些数据在后期阶段通过对每个片段的预测类分布进行简单平均来进行聚合。在下一阶段，研究了各种组合模态的技术，并将其与具有RBF核的最佳支持向量机相结合。我们的方法获得了37.84%的准确率，优于组织者提供的最佳基线模型的33.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Neural Networks for Emotion Recognition in the Wild

In this paper we present neural networks based method for emotion recognition. Proposed model was developed as part of 2014 Emotion Recognition in the Wild Challenge. It is composed of modality specific neural networks, which where trained separately on audio and video data extracted from short video clips taken from various movies. Each network was trained on frame-level data, which in later stages were aggregated by simple averaging of predicted class distributions for each clip. In the next stage various techniques for combining modalities where investigated with the best being support vector machine with RBF kernel. Our method achieved accuracy of 37.84%, which is better than 33.7% obtained by the best baseline model provided by organisers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th International Conference on Multimodal Interaction

自引率

0.00%

发文量