Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-21 DOI:10.1109/CVPR.2017.167

Jinwei Gu, Xiaodong Yang, Shalini De Mello, J. Kautz

{"title":"Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Network","authors":"Jinwei Gu, Xiaodong Yang, Shalini De Mello, J. Kautz","doi":"10.1109/CVPR.2017.167","DOIUrl":null,"url":null,"abstract":"Facial analysis in videos, including head pose estimation and facial landmark localization, is key for many applications such as facial animation capture, human activity recognition, and human-computer interaction. In this paper, we propose to use a recurrent neural network (RNN) for joint estimation and tracking of facial features in videos. We are inspired by the fact that the computation performed in an RNN bears resemblance to Bayesian filters, which have been used for tracking in many previous methods for facial analysis from videos. Bayesian filters used in these methods, however, require complicated, problem-specific design and tuning. In contrast, our proposed RNN-based method avoids such tracker-engineering by learning from training data, similar to how a convolutional neural network (CNN) avoids feature-engineering for image classification. As an end-to-end network, the proposed RNN-based method provides a generic and holistic solution for joint estimation and tracking of various types of facial features from consecutive video frames. Extensive experimental results on head pose estimation and facial landmark localization from videos demonstrate that the proposed RNN-based method outperforms frame-wise models and Bayesian filtering. In addition, we create a large-scale synthetic dataset for head pose estimation, with which we achieve state-of-the-art performance on a benchmark dataset.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"35 1","pages":"1531-1540"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"111","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2017.167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 111

Abstract

Facial analysis in videos, including head pose estimation and facial landmark localization, is key for many applications such as facial animation capture, human activity recognition, and human-computer interaction. In this paper, we propose to use a recurrent neural network (RNN) for joint estimation and tracking of facial features in videos. We are inspired by the fact that the computation performed in an RNN bears resemblance to Bayesian filters, which have been used for tracking in many previous methods for facial analysis from videos. Bayesian filters used in these methods, however, require complicated, problem-specific design and tuning. In contrast, our proposed RNN-based method avoids such tracker-engineering by learning from training data, similar to how a convolutional neural network (CNN) avoids feature-engineering for image classification. As an end-to-end network, the proposed RNN-based method provides a generic and holistic solution for joint estimation and tracking of various types of facial features from consecutive video frames. Extensive experimental results on head pose estimation and facial landmark localization from videos demonstrate that the proposed RNN-based method outperforms frame-wise models and Bayesian filtering. In addition, we create a large-scale synthetic dataset for head pose estimation, with which we achieve state-of-the-art performance on a benchmark dataset.

查看原文本刊更多论文

动态面部分析:从贝叶斯滤波到递归神经网络

视频中的面部分析，包括头部姿态估计和面部地标定位，是面部动画捕捉、人类活动识别和人机交互等许多应用的关键。在本文中，我们提出使用递归神经网络(RNN)对视频中的面部特征进行联合估计和跟踪。我们受到RNN中执行的计算与贝叶斯滤波器相似的事实的启发，贝叶斯滤波器已被用于跟踪许多先前的面部分析视频的方法。然而，这些方法中使用的贝叶斯过滤器需要复杂的、特定于问题的设计和调优。相比之下，我们提出的基于rnn的方法通过从训练数据中学习来避免这种跟踪器工程，类似于卷积神经网络(CNN)如何避免图像分类的特征工程。作为端到端网络，本文提出的基于rnn的方法为连续视频帧中各种类型面部特征的联合估计和跟踪提供了通用的整体解决方案。基于视频的头部姿态估计和面部地标定位的大量实验结果表明，基于rnn的方法优于基于帧的模型和贝叶斯滤波。此外，我们创建了一个用于头部姿态估计的大规模合成数据集，通过该数据集，我们在基准数据集上实现了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量