FACSCaps: Pose-Independent Facial Action Coding with Capsules.

Itir Onal Ertugrul, Lászlό A Jeni, Jeffrey F Cohn
{"title":"FACSCaps: Pose-Independent Facial Action Coding with Capsules.","authors":"Itir Onal Ertugrul,&nbsp;Lászlό A Jeni,&nbsp;Jeffrey F Cohn","doi":"10.1109/CVPRW.2018.00287","DOIUrl":null,"url":null,"abstract":"<p><p>Most automated facial expression analysis methods treat the face as a 2D object, flat like a sheet of paper. That works well provided images are frontal or nearly so. In real- world conditions, moderate to large head rotation is common and system performance to recognize expression degrades. Multi-view Convolutional Neural Networks (CNNs) have been proposed to increase robustness to pose, but they require greater model sizes and may generalize poorly across views that are not included in the training set. We propose FACSCaps architecture to handle multi-view and multi-label facial action unit (AU) detection within a single model that can generalize to novel views. Additionally, FACSCaps's ability to synthesize faces enables insights into what is leaned by the model. FACSCaps models video frames using matrix capsules, where hierarchical pose relationships between face parts are built into internal representations. The model is trained by jointly optimizing a multi-label loss and the reconstruction accuracy. FACSCaps was evaluated using the FERA 2017 facial expression dataset that includes spontaneous facial expressions in a wide range of head orientations. FACSCaps outperformed both state-of-the-art CNNs and their temporal extensions.</p>","PeriodicalId":89346,"journal":{"name":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","volume":"2018 ","pages":"2211-2220"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CVPRW.2018.00287","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2018.00287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2018/12/17 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Most automated facial expression analysis methods treat the face as a 2D object, flat like a sheet of paper. That works well provided images are frontal or nearly so. In real- world conditions, moderate to large head rotation is common and system performance to recognize expression degrades. Multi-view Convolutional Neural Networks (CNNs) have been proposed to increase robustness to pose, but they require greater model sizes and may generalize poorly across views that are not included in the training set. We propose FACSCaps architecture to handle multi-view and multi-label facial action unit (AU) detection within a single model that can generalize to novel views. Additionally, FACSCaps's ability to synthesize faces enables insights into what is leaned by the model. FACSCaps models video frames using matrix capsules, where hierarchical pose relationships between face parts are built into internal representations. The model is trained by jointly optimizing a multi-label loss and the reconstruction accuracy. FACSCaps was evaluated using the FERA 2017 facial expression dataset that includes spontaneous facial expressions in a wide range of head orientations. FACSCaps outperformed both state-of-the-art CNNs and their temporal extensions.

Abstract Image

Abstract Image

FACSCaps:与姿势无关的面部动作编码与胶囊。
大多数自动面部表情分析方法都将面部视为二维物体,像一张纸一样平坦。如果图像是正面的或几乎是正面的,这种方法效果很好。在现实世界中,中等到大的头部旋转很常见,识别表情的系统性能会下降。多视图卷积神经网络(CNNs)已被提出以提高姿态的鲁棒性,但它们需要更大的模型大小,并且可能在未包含在训练集中的视图之间推广较差。我们提出了FACSCaps架构来处理单个模型中的多视图和多标签面部动作单元(AU)检测,该模型可以推广到新视图。此外,FACSCaps合成人脸的能力使我们能够深入了解模型所倾向的内容。FACSCaps使用矩阵胶囊对视频帧进行建模,其中面部部分之间的层次姿势关系被构建到内部表示中。通过联合优化多标签损失和重建精度来训练模型。FACSCaps使用FERA 2017面部表情数据集进行评估,该数据集包括各种头部方向的自发面部表情。FACSCaps的性能优于最先进的细胞神经网络及其时间扩展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信