Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild

Anbang Yao, Junchao Shao, Ningning Ma, Yurong Chen
{"title":"Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild","authors":"Anbang Yao, Junchao Shao, Ningning Ma, Yurong Chen","doi":"10.1145/2818346.2830585","DOIUrl":null,"url":null,"abstract":"The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to explore the significance of the latent relations among changing features resulted from facial muscle motions. In this paper, we study this recognition challenge from the perspective of analyzing the relations among expression-specific facial features in an explicit manner. Our method has three key components. First, we propose a pair-wise learning strategy to automatically seek a set of facial image patches which are important for discriminating two particular emotion categories. We found these learnt local patches are in part consistent with the locations of expression-specific Action Units (AUs), thus the features extracted from such kind of facial patches are named AU-aware facial features. Second, in each pair-wise task, we use an undirected graph structure, which takes learnt facial patches as individual vertices, to encode feature relations between any two learnt facial patches. Finally, a robust emotion representation is constructed by concatenating all task-specific graph-structured facial feature relations sequentially. Extensive experiments on the EmotiW 2015 Challenge testify the efficacy of the proposed approach. Without using additional data, our final submissions achieved competitive results on both sub-challenges including the image based static facial expression recognition (we got 55.38% recognition accuracy outperforming the baseline 39.13% with a margin of 16.25%) and the audio-video based emotion recognition (we got 53.80% recognition accuracy outperforming the baseline 39.33% and the 2014 winner team's final result 50.37% with the margins of 14.47% and 3.43%, respectively).","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"101","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2818346.2830585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 101

Abstract

The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to explore the significance of the latent relations among changing features resulted from facial muscle motions. In this paper, we study this recognition challenge from the perspective of analyzing the relations among expression-specific facial features in an explicit manner. Our method has three key components. First, we propose a pair-wise learning strategy to automatically seek a set of facial image patches which are important for discriminating two particular emotion categories. We found these learnt local patches are in part consistent with the locations of expression-specific Action Units (AUs), thus the features extracted from such kind of facial patches are named AU-aware facial features. Second, in each pair-wise task, we use an undirected graph structure, which takes learnt facial patches as individual vertices, to encode feature relations between any two learnt facial patches. Finally, a robust emotion representation is constructed by concatenating all task-specific graph-structured facial feature relations sequentially. Extensive experiments on the EmotiW 2015 Challenge testify the efficacy of the proposed approach. Without using additional data, our final submissions achieved competitive results on both sub-challenges including the image based static facial expression recognition (we got 55.38% recognition accuracy outperforming the baseline 39.13% with a margin of 16.25%) and the audio-video based emotion recognition (we got 53.80% recognition accuracy outperforming the baseline 39.33% and the 2014 winner team's final result 50.37% with the margins of 14.47% and 3.43%, respectively).
捕捉au感知的面部特征及其在野外情绪识别中的潜在关系
“野生情绪识别挑战赛”已经举办了三年。之前的获奖团队主要专注于设计特定的深度神经网络或融合各种手工制作和深度卷积特征。他们都忽视了对面部肌肉运动所导致的特征变化之间潜在关系的探讨。在本文中,我们从明确分析表情特征之间关系的角度来研究这一识别挑战。我们的方法有三个关键组成部分。首先,我们提出了一种成对学习策略来自动寻找一组面部图像补丁,这些补丁对于区分两种特定的情绪类别很重要。我们发现这些学习到的局部斑块与表达特异性动作单元(expression-specific Action Units, au)的位置部分一致,因此从这种面部斑块中提取的特征被称为au感知面部特征。其次,在每个成对任务中,我们使用无向图结构,将学习到的面部斑块作为单个顶点,编码任何两个学习到的面部斑块之间的特征关系。最后,通过顺序连接所有特定任务的图形结构面部特征关系,构建了鲁棒的情感表示。EmotiW 2015挑战赛上的大量实验证明了所提出方法的有效性。在没有使用额外数据的情况下,我们最终提交的作品在基于图像的静态面部表情识别(我们获得了55.38%的识别准确率,优于基线的39.13%,差值为16.25%)和基于音频视频的情感识别(我们获得了53.80%的识别准确率,优于基线的39.33%,2014年冠军团队的最终结果为50.37%,差值分别为14.47%和3.43%)这两个子挑战上都取得了有竞争力的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信