Deep learning vs. kernel methods: Performance for emotion prediction in videos

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI:10.1109/ACII.2015.7344554

Yoann Baveye, E. Dellandréa, Christel Chamaret, Liming Luke Chen

{"title":"Deep learning vs. kernel methods: Performance for emotion prediction in videos","authors":"Yoann Baveye, E. Dellandréa, Christel Chamaret, Liming Luke Chen","doi":"10.1109/ACII.2015.7344554","DOIUrl":null,"url":null,"abstract":"Recently, mainly due to the advances of deep learning, the performances in scene and object recognition have been progressing intensively. On the other hand, more subjective recognition tasks, such as emotion prediction, stagnate at moderate levels. In such context, is it possible to make affective computational models benefit from the breakthroughs in deep learning? This paper proposes to introduce the strength of deep learning in the context of emotion prediction in videos. The two main contributions are as follow: (i) a new dataset, composed of 30 movies under Creative Commons licenses, continuously annotated along the induced valence and arousal axes (publicly available) is introduced, for which (ii) the performance of the Convolutional Neural Networks (CNN) through supervised fine-tuning, the Support Vector Machines for Regression (SVR) and the combination of both (Transfer Learning) are computed and discussed. To the best of our knowledge, it is the first approach in the literature using CNNs to predict dimensional affective scores from videos. The experimental results show that the limited size of the dataset prevents the learning or finetuning of CNN-based frameworks but that transfer learning is a promising solution to improve the performance of affective movie content analysis frameworks as long as very large datasets annotated along affective dimensions are not available.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"61 1","pages":"77-83"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACII.2015.7344554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

Abstract

Recently, mainly due to the advances of deep learning, the performances in scene and object recognition have been progressing intensively. On the other hand, more subjective recognition tasks, such as emotion prediction, stagnate at moderate levels. In such context, is it possible to make affective computational models benefit from the breakthroughs in deep learning? This paper proposes to introduce the strength of deep learning in the context of emotion prediction in videos. The two main contributions are as follow: (i) a new dataset, composed of 30 movies under Creative Commons licenses, continuously annotated along the induced valence and arousal axes (publicly available) is introduced, for which (ii) the performance of the Convolutional Neural Networks (CNN) through supervised fine-tuning, the Support Vector Machines for Regression (SVR) and the combination of both (Transfer Learning) are computed and discussed. To the best of our knowledge, it is the first approach in the literature using CNNs to predict dimensional affective scores from videos. The experimental results show that the limited size of the dataset prevents the learning or finetuning of CNN-based frameworks but that transfer learning is a promising solution to improve the performance of affective movie content analysis frameworks as long as very large datasets annotated along affective dimensions are not available.

查看原文本刊更多论文

深度学习与核方法:视频中情绪预测的性能

近年来，主要由于深度学习的进步，在场景和目标识别方面的性能有了很大的进步。另一方面，更主观的识别任务，如情绪预测，停滞在中等水平。在这样的背景下，是否有可能让情感计算模型从深度学习的突破中受益?本文提出在视频情感预测的背景下引入深度学习的优势。两个主要贡献如下:(i)引入了一个新的数据集，该数据集由创作共用许可下的30部电影组成，沿着诱导价和唤醒轴(公开可用)连续注释;(ii)计算并讨论了卷积神经网络(CNN)通过监督微调、回归支持向量机(SVR)以及两者结合(迁移学习)的性能。据我们所知，这是文献中第一个使用cnn从视频中预测维度情感分数的方法。实验结果表明，数据集的有限大小阻碍了基于cnn的框架的学习或微调，但只要没有沿着情感维度注释的非常大的数据集，迁移学习是提高情感电影内容分析框架性能的有希望的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

自引率

0.00%

发文量