基于视频的深度监督神经网络情感识别

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI:10.1145/3242969.3264978

Yingruo Fan, J. Lam, V. Li

{"title":"基于视频的深度监督神经网络情感识别","authors":"Yingruo Fan, J. Lam, V. Li","doi":"10.1145/3242969.3264978","DOIUrl":null,"url":null,"abstract":"Emotion recognition (ER) based on natural facial images/videos has been studied for some years and considered a comparatively hot topic in the field of affective computing. However, it remains a challenge to perform ER in the wild, given the noises generated from head pose, face deformation, and illumination variation. To address this challenge, motivated by recent progress in Convolutional Neural Network (CNN), we develop a novel deeply supervised CNN (DSN) architecture, taking the multi-level and multi-scale features extracted from different convolutional layers to provide a more advanced representation of ER. By embedding a series of side-output layers, our DSN model provides class-wise supervision and integrates predictions from multiple layers. Finally, our team ranked 3rd at the EmotiW 2018 challenge with our model achieving an accuracy of 61.1%.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"51 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":"{\"title\":\"Video-based Emotion Recognition Using Deeply-Supervised Neural Networks\",\"authors\":\"Yingruo Fan, J. Lam, V. Li\",\"doi\":\"10.1145/3242969.3264978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emotion recognition (ER) based on natural facial images/videos has been studied for some years and considered a comparatively hot topic in the field of affective computing. However, it remains a challenge to perform ER in the wild, given the noises generated from head pose, face deformation, and illumination variation. To address this challenge, motivated by recent progress in Convolutional Neural Network (CNN), we develop a novel deeply supervised CNN (DSN) architecture, taking the multi-level and multi-scale features extracted from different convolutional layers to provide a more advanced representation of ER. By embedding a series of side-output layers, our DSN model provides class-wise supervision and integrates predictions from multiple layers. Finally, our team ranked 3rd at the EmotiW 2018 challenge with our model achieving an accuracy of 61.1%.\",\"PeriodicalId\":308751,\"journal\":{\"name\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"volume\":\"51 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"66\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242969.3264978\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3264978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 66

摘要

基于自然面部图像/视频的情绪识别(ER)已经研究多年，是情感计算领域的一个比较热门的研究课题。然而，考虑到头部姿势、面部变形和光照变化产生的噪声，在野外进行ER仍然是一个挑战。为了应对这一挑战，受卷积神经网络(CNN)最新进展的激励，我们开发了一种新的深度监督CNN (DSN)架构，采用从不同卷积层提取的多层次和多尺度特征来提供更高级的ER表示。通过嵌入一系列侧输出层，我们的DSN模型提供了分类监督，并集成了来自多个层的预测。最后，我们的团队在EmotiW 2018挑战赛中排名第三，我们的模型达到了61.1%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Video-based Emotion Recognition Using Deeply-Supervised Neural Networks

Emotion recognition (ER) based on natural facial images/videos has been studied for some years and considered a comparatively hot topic in the field of affective computing. However, it remains a challenge to perform ER in the wild, given the noises generated from head pose, face deformation, and illumination variation. To address this challenge, motivated by recent progress in Convolutional Neural Network (CNN), we develop a novel deeply supervised CNN (DSN) architecture, taking the multi-level and multi-scale features extracted from different convolutional layers to provide a more advanced representation of ER. By embedding a series of side-output layers, our DSN model provides class-wise supervision and integrates predictions from multiple layers. Finally, our team ranked 3rd at the EmotiW 2018 challenge with our model achieving an accuracy of 61.1%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th ACM International Conference on Multimodal Interaction

自引率

0.00%

发文量