情感视频分析中语境与面孔的深层情感特征

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI:10.1145/3078971.3079027

C. Baecchi, Tiberio Uricchio, M. Bertini, A. Bimbo

{"title":"情感视频分析中语境与面孔的深层情感特征","authors":"C. Baecchi, Tiberio Uricchio, M. Bertini, A. Bimbo","doi":"10.1145/3078971.3079027","DOIUrl":null,"url":null,"abstract":"Given the huge quantity of hours of video available on video sharing platforms such as YouTube, Vimeo, etc. development of automatic tools that help users find videos that fit their interests has attracted the attention of both scientific and industrial communities. So far the majority of the works have addressed semantic analysis, to identify objects, scenes and events depicted in videos, but more recently affective analysis of videos has started to gain more attention. In this work we investigate the use of sentiment driven features to classify the induced sentiment of a video, i.e. the sentiment reaction of the user. Instead of using standard computer vision features such as CNN features or SIFT features trained to recognize objects and scenes, we exploit sentiment related features such as the ones provided by Deep-SentiBank, and features extracted from models that exploit deep networks trained on face expressions. We experiment on two recently introduced datasets: LIRIS-ACCEDE and MEDIAEVAL-2015, that provide sentiment annotations of a large set of short videos. We show that our approach not only outperforms the current state-of-the-art in terms of valence and arousal classification accuracy, but it also uses a smaller number of features, requiring thus less video processing.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Deep Sentiment Features of Context and Faces for Affective Video Analysis\",\"authors\":\"C. Baecchi, Tiberio Uricchio, M. Bertini, A. Bimbo\",\"doi\":\"10.1145/3078971.3079027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given the huge quantity of hours of video available on video sharing platforms such as YouTube, Vimeo, etc. development of automatic tools that help users find videos that fit their interests has attracted the attention of both scientific and industrial communities. So far the majority of the works have addressed semantic analysis, to identify objects, scenes and events depicted in videos, but more recently affective analysis of videos has started to gain more attention. In this work we investigate the use of sentiment driven features to classify the induced sentiment of a video, i.e. the sentiment reaction of the user. Instead of using standard computer vision features such as CNN features or SIFT features trained to recognize objects and scenes, we exploit sentiment related features such as the ones provided by Deep-SentiBank, and features extracted from models that exploit deep networks trained on face expressions. We experiment on two recently introduced datasets: LIRIS-ACCEDE and MEDIAEVAL-2015, that provide sentiment annotations of a large set of short videos. We show that our approach not only outperforms the current state-of-the-art in terms of valence and arousal classification accuracy, but it also uses a smaller number of features, requiring thus less video processing.\",\"PeriodicalId\":403556,\"journal\":{\"name\":\"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3078971.3079027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078971.3079027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

由于YouTube、Vimeo等视频分享平台上有大量的视频，帮助用户找到符合他们兴趣的视频的自动工具的开发引起了科学界和工业界的关注。到目前为止，大多数研究都是针对语义分析，以识别视频中描述的物体、场景和事件，但最近视频的情感分析开始获得更多关注。在这项工作中，我们研究了使用情感驱动特征来分类视频的诱导情感，即用户的情感反应。我们没有使用标准的计算机视觉特征(如CNN特征或SIFT特征)来训练识别物体和场景，而是利用了与情感相关的特征(如deep - sentibank提供的特征)，以及从利用面部表情训练的深度网络模型中提取的特征。我们在最近引入的两个数据集上进行了实验:LIRIS-ACCEDE和MEDIAEVAL-2015，它们提供了大量短视频的情感注释。我们表明，我们的方法不仅在价态和唤醒分类精度方面优于当前最先进的技术，而且还使用了更少的特征，因此需要更少的视频处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Sentiment Features of Context and Faces for Affective Video Analysis

Given the huge quantity of hours of video available on video sharing platforms such as YouTube, Vimeo, etc. development of automatic tools that help users find videos that fit their interests has attracted the attention of both scientific and industrial communities. So far the majority of the works have addressed semantic analysis, to identify objects, scenes and events depicted in videos, but more recently affective analysis of videos has started to gain more attention. In this work we investigate the use of sentiment driven features to classify the induced sentiment of a video, i.e. the sentiment reaction of the user. Instead of using standard computer vision features such as CNN features or SIFT features trained to recognize objects and scenes, we exploit sentiment related features such as the ones provided by Deep-SentiBank, and features extracted from models that exploit deep networks trained on face expressions. We experiment on two recently introduced datasets: LIRIS-ACCEDE and MEDIAEVAL-2015, that provide sentiment annotations of a large set of short videos. We show that our approach not only outperforms the current state-of-the-art in terms of valence and arousal classification accuracy, but it also uses a smaller number of features, requiring thus less video processing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

自引率

0.00%

发文量