个性化面部动作估计的面部表情操纵

IF 1.3 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Frontiers in signal processing Pub Date : 2022-04-27 DOI:10.3389/frsip.2022.861641

Koichiro Niinuma, Itir Onal Ertugrul, J. Cohn, László A. Jeni

{"title":"个性化面部动作估计的面部表情操纵","authors":"Koichiro Niinuma, Itir Onal Ertugrul, J. Cohn, László A. Jeni","doi":"10.3389/frsip.2022.861641","DOIUrl":null,"url":null,"abstract":"Limited sizes of annotated video databases of spontaneous facial expression, imbalanced action unit labels, and domain shift are three main obstacles in training models to detect facial actions and estimate their intensity. To address these problems, we propose an approach that incorporates facial expression generation for facial action unit intensity estimation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and trains a GAN-based network to synthesize novel images with facial action units of interest. We leverage the synthetic images to achieve two goals: 1) generating AU-balanced databases, and 2) tackling domain shift with personalized networks. To generate a balanced database, we synthesize expressions with varying AU intensities and perform semantic resampling. Our experimental results on FERA17 show that networks trained on synthesized facial expressions outperform those trained on actual facial expressions and surpass current state-of-the-art approaches. To tackle domain shift, we propose personalizing pretrained networks. We generate synthetic expressions of each target subject with varying AU intensity labels and use the person-specific synthetic images to fine-tune pretrained networks. To evaluate performance of the personalized networks, we use DISFA and PAIN databases. Personalized networks, which require only a single image from each target subject to generate synthetic images, achieved significant improvement in generalizing to unseen domains.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"10 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Facial Expression Manipulation for Personalized Facial Action Estimation\",\"authors\":\"Koichiro Niinuma, Itir Onal Ertugrul, J. Cohn, László A. Jeni\",\"doi\":\"10.3389/frsip.2022.861641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Limited sizes of annotated video databases of spontaneous facial expression, imbalanced action unit labels, and domain shift are three main obstacles in training models to detect facial actions and estimate their intensity. To address these problems, we propose an approach that incorporates facial expression generation for facial action unit intensity estimation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and trains a GAN-based network to synthesize novel images with facial action units of interest. We leverage the synthetic images to achieve two goals: 1) generating AU-balanced databases, and 2) tackling domain shift with personalized networks. To generate a balanced database, we synthesize expressions with varying AU intensities and perform semantic resampling. Our experimental results on FERA17 show that networks trained on synthesized facial expressions outperform those trained on actual facial expressions and surpass current state-of-the-art approaches. To tackle domain shift, we propose personalizing pretrained networks. We generate synthetic expressions of each target subject with varying AU intensity labels and use the person-specific synthetic images to fine-tune pretrained networks. To evaluate performance of the personalized networks, we use DISFA and PAIN databases. Personalized networks, which require only a single image from each target subject to generate synthetic images, achieved significant improvement in generalizing to unseen domains.\",\"PeriodicalId\":93557,\"journal\":{\"name\":\"Frontiers in signal processing\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in signal processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frsip.2022.861641\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in signal processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frsip.2022.861641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 1

摘要

自发面部表情视频数据库的标注规模有限、动作单元标签不平衡和领域偏移是训练模型检测面部动作和估计其强度的三个主要障碍。为了解决这些问题，我们提出了一种结合面部表情生成的方法来估计面部动作单元强度。我们的方法从每个视频帧中重建面部的3D形状，将3D网格对齐到规范视图，并训练基于gan的网络来合成具有感兴趣的面部动作单元的新图像。我们利用合成图像来实现两个目标:1)生成au平衡的数据库，2)用个性化网络解决领域转移问题。为了生成一个平衡的数据库，我们合成具有不同AU强度的表达式并执行语义重采样。我们在FERA17上的实验结果表明，在合成面部表情上训练的网络比那些在实际面部表情上训练的网络表现得更好，并且超过了目前最先进的方法。为了解决领域转移问题，我们提出个性化预训练网络。我们用不同的AU强度标签生成每个目标受试者的合成表达，并使用针对个人的合成图像来微调预训练网络。为了评估个性化网络的性能，我们使用了DISFA和PAIN数据库。个性化网络只需要来自每个目标主题的单个图像来生成合成图像，在泛化到未知领域方面取得了显着改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Facial Expression Manipulation for Personalized Facial Action Estimation

Limited sizes of annotated video databases of spontaneous facial expression, imbalanced action unit labels, and domain shift are three main obstacles in training models to detect facial actions and estimate their intensity. To address these problems, we propose an approach that incorporates facial expression generation for facial action unit intensity estimation. Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and trains a GAN-based network to synthesize novel images with facial action units of interest. We leverage the synthetic images to achieve two goals: 1) generating AU-balanced databases, and 2) tackling domain shift with personalized networks. To generate a balanced database, we synthesize expressions with varying AU intensities and perform semantic resampling. Our experimental results on FERA17 show that networks trained on synthesized facial expressions outperform those trained on actual facial expressions and surpass current state-of-the-art approaches. To tackle domain shift, we propose personalizing pretrained networks. We generate synthetic expressions of each target subject with varying AU intensity labels and use the person-specific synthetic images to fine-tune pretrained networks. To evaluate performance of the personalized networks, we use DISFA and PAIN databases. Personalized networks, which require only a single image from each target subject to generate synthetic images, achieved significant improvement in generalizing to unseen domains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in signal processing

自引率

0.00%

发文量