单幅图像三维人脸装配的不确定性感知半监督学习

Proceedings of the 30th ACM International Conference on Multimedia Pub Date : 2022-10-10 DOI:10.1145/3503161.3548285

Yong Zhao, Haifeng Chen, H. Sahli, Ke Lu, D. Jiang

{"title":"单幅图像三维人脸装配的不确定性感知半监督学习","authors":"Yong Zhao, Haifeng Chen, H. Sahli, Ke Lu, D. Jiang","doi":"10.1145/3503161.3548285","DOIUrl":null,"url":null,"abstract":"We present a method to rig 3D faces via Action Units (AUs), viewpoint and light direction, from single input image. Existing 3D methods for face synthesis and animation rely heavily on 3D morphable model (3DMM), which was built on 3D data and cannot provide intuitive expression parameters, while AU-driven 2D methods cannot handle head pose and lighting effect. We bridge the gap by integrating a recent 3D reconstruction method with 2D AU-driven method in a semi-supervised fashion. Built upon the auto-encoding 3D face reconstruction model that decouples depth, albedo, viewpoint and light without any supervision, we further decouple expression from identity for depth and albedo with a novel conditional feature translation module and pretrained critics for AU intensity estimation and image classification. Novel objective functions are designed using unlabeled in-the-wild images and in-door images with AU labels. We also leverage uncertainty losses to model the probably changing AU region of images as input noise for synthesis, and model the noisy AU intensity labels for intensity estimation of the AU critic. Experiments with face editing and animation on four datasets show that, compared with six state-of-the-art methods, our proposed method is superior and effective on expression consistency, identity similarity and pose similarity.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"2021 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Uncertainty-Aware Semi-Supervised Learning of 3D Face Rigging from Single Image\",\"authors\":\"Yong Zhao, Haifeng Chen, H. Sahli, Ke Lu, D. Jiang\",\"doi\":\"10.1145/3503161.3548285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a method to rig 3D faces via Action Units (AUs), viewpoint and light direction, from single input image. Existing 3D methods for face synthesis and animation rely heavily on 3D morphable model (3DMM), which was built on 3D data and cannot provide intuitive expression parameters, while AU-driven 2D methods cannot handle head pose and lighting effect. We bridge the gap by integrating a recent 3D reconstruction method with 2D AU-driven method in a semi-supervised fashion. Built upon the auto-encoding 3D face reconstruction model that decouples depth, albedo, viewpoint and light without any supervision, we further decouple expression from identity for depth and albedo with a novel conditional feature translation module and pretrained critics for AU intensity estimation and image classification. Novel objective functions are designed using unlabeled in-the-wild images and in-door images with AU labels. We also leverage uncertainty losses to model the probably changing AU region of images as input noise for synthesis, and model the noisy AU intensity labels for intensity estimation of the AU critic. Experiments with face editing and animation on four datasets show that, compared with six state-of-the-art methods, our proposed method is superior and effective on expression consistency, identity similarity and pose similarity.\",\"PeriodicalId\":412792,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"volume\":\"2021 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3503161.3548285\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们提出了一种从单输入图像中通过动作单元(au)、视点和光方向来装配三维人脸的方法。现有人脸合成和动画的3D方法严重依赖3D变形模型(3DMM)，该模型建立在3D数据基础上，无法提供直观的表情参数，而au驱动的2D方法无法处理头部姿态和光照效果。我们通过以半监督的方式将最近的3D重建方法与2D au驱动方法集成在一起，弥合了这一差距。在自动编码的3D人脸重建模型的基础上，我们在没有任何监督的情况下解耦了深度、反照率、视点和光线，我们进一步用一个新的条件特征翻译模块和预训练的批评来解耦深度和反照率的身份。利用未标记的野外图像和带AU标签的室内图像设计了新的目标函数。我们还利用不确定性损失对图像中可能变化的AU区域进行建模，作为合成的输入噪声，并对有噪声的AU强度标签进行建模，用于AU评论家的强度估计。在4个数据集上进行的人脸编辑和动画实验表明，与现有的6种方法相比，本文提出的方法在表情一致性、身份相似度和姿态相似度方面具有优越性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Uncertainty-Aware Semi-Supervised Learning of 3D Face Rigging from Single Image

We present a method to rig 3D faces via Action Units (AUs), viewpoint and light direction, from single input image. Existing 3D methods for face synthesis and animation rely heavily on 3D morphable model (3DMM), which was built on 3D data and cannot provide intuitive expression parameters, while AU-driven 2D methods cannot handle head pose and lighting effect. We bridge the gap by integrating a recent 3D reconstruction method with 2D AU-driven method in a semi-supervised fashion. Built upon the auto-encoding 3D face reconstruction model that decouples depth, albedo, viewpoint and light without any supervision, we further decouple expression from identity for depth and albedo with a novel conditional feature translation module and pretrained critics for AU intensity estimation and image classification. Novel objective functions are designed using unlabeled in-the-wild images and in-door images with AU labels. We also leverage uncertainty losses to model the probably changing AU region of images as input noise for synthesis, and model the noisy AU intensity labels for intensity estimation of the AU critic. Experiments with face editing and animation on four datasets show that, compared with six state-of-the-art methods, our proposed method is superior and effective on expression consistency, identity similarity and pose similarity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 30th ACM International Conference on Multimedia

自引率

0.00%

发文量