Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI:10.1109/ICCV.2019.00082

Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Q. Ji

{"title":"Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data","authors":"Yong Zhang, Haiyong Jiang, Baoyuan Wu, Yanbo Fan, Q. Ji","doi":"10.1109/ICCV.2019.00082","DOIUrl":null,"url":null,"abstract":"Facial action unit (AU) intensity estimation is a fundamental task for facial behaviour analysis. Most previous methods use a whole face image as input for intensity prediction. Considering that AUs are defined according to their corresponding local appearance, a few patch-based methods utilize image features of local patches. However, fusion of local features is always performed via straightforward feature concatenation or summation. Besides, these methods require fully annotated databases for model learning, which is expensive to acquire. In this paper, we propose a novel weakly supervised patch-based deep model on basis of two types of attention mechanisms for joint intensity estimation of multiple AUs. The model consists of a feature fusion module and a label fusion module. And we augment attention mechanisms of these two modules with a learnable task-related context, as one patch may play different roles in analyzing different AUs and each AU has its own temporal evolution rule. The context-aware feature fusion module is used to capture spatial relationships among local patches while the context-aware label fusion module is used to capture the temporal dynamics of AUs. The latter enables the model to be trained on a partially annotated database. Experimental evaluations on two benchmark expression databases demonstrate the superior performance of the proposed method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"733-742"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2019.00082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Facial action unit (AU) intensity estimation is a fundamental task for facial behaviour analysis. Most previous methods use a whole face image as input for intensity prediction. Considering that AUs are defined according to their corresponding local appearance, a few patch-based methods utilize image features of local patches. However, fusion of local features is always performed via straightforward feature concatenation or summation. Besides, these methods require fully annotated databases for model learning, which is expensive to acquire. In this paper, we propose a novel weakly supervised patch-based deep model on basis of two types of attention mechanisms for joint intensity estimation of multiple AUs. The model consists of a feature fusion module and a label fusion module. And we augment attention mechanisms of these two modules with a learnable task-related context, as one patch may play different roles in analyzing different AUs and each AU has its own temporal evolution rule. The context-aware feature fusion module is used to capture spatial relationships among local patches while the context-aware label fusion module is used to capture the temporal dynamics of AUs. The latter enables the model to be trained on a partially annotated database. Experimental evaluations on two benchmark expression databases demonstrate the superior performance of the proposed method.

查看原文本刊更多论文

基于上下文感知特征和标签融合的部分标记数据面部动作单元强度估计

面部动作单元(AU)强度估计是面部行为分析的基本任务。以前的方法大多使用整张人脸图像作为输入进行强度预测。考虑到AUs是根据其对应的局部外观来定义的，一些基于patch的方法利用了局部patch的图像特征。然而，局部特征的融合通常是通过直接的特征拼接或求和来实现的。此外，这些方法需要完全注释的数据库来进行模型学习，这是昂贵的。在本文中，我们提出了一种基于两种注意机制的基于弱监督patch的深度模型，用于多个AUs的联合强度估计。该模型由特征融合模块和标签融合模块组成。由于一个patch在分析不同的AU时可能扮演不同的角色，并且每个AU都有自己的时间演化规律，因此我们在可学习的任务相关上下文中增强了这两个模块的注意机制。上下文感知特征融合模块用于捕获局部斑块之间的空间关系，上下文感知标签融合模块用于捕获AUs的时间动态。后者使模型能够在部分注释的数据库上进行训练。在两个基准表达式数据库上的实验评估表明了该方法的优越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量