DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

arXiv - CS - Graphics Pub Date : 2024-08-20 DOI:arxiv-2408.10588

Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang

{"title":"DEGAS: Detailed Expressions on Full-Body Gaussian Avatars","authors":"Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang","doi":"arxiv-2408.10588","DOIUrl":null,"url":null,"abstract":"Although neural rendering has made significant advancements in creating\nlifelike, animatable full-body and head avatars, incorporating detailed\nexpressions into full-body avatars remains largely unexplored. We present\nDEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for\nfull-body avatars with rich facial expressions. Trained on multiview videos of\na given subject, our method learns a conditional variational autoencoder that\ntakes both the body motion and facial expression as driving signals to generate\nGaussian maps in the UV layout. To drive the facial expressions, instead of the\ncommonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to\nadopt the expression latent space trained solely on 2D portrait images,\nbridging the gap between 2D talking faces and 3D avatars. Leveraging the\nrendering capability of 3DGS and the rich expressiveness of the expression\nlatent space, the learned avatars can be reenacted to reproduce photorealistic\nrendering images with subtle and accurate facial expressions. Experiments on an\nexisting dataset and our newly proposed dataset of full-body talking avatars\ndemonstrate the efficacy of our method. We also propose an audio-driven\nextension of our method with the help of 2D talking faces, opening new\npossibilities to interactive AI agents.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method learns a conditional variational autoencoder that takes both the body motion and facial expression as driving signals to generate Gaussian maps in the UV layout. To drive the facial expressions, instead of the commonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to adopt the expression latent space trained solely on 2D portrait images, bridging the gap between 2D talking faces and 3D avatars. Leveraging the rendering capability of 3DGS and the rich expressiveness of the expression latent space, the learned avatars can be reenacted to reproduce photorealistic rendering images with subtle and accurate facial expressions. Experiments on an existing dataset and our newly proposed dataset of full-body talking avatars demonstrate the efficacy of our method. We also propose an audio-driven extension of our method with the help of 2D talking faces, opening new possibilities to interactive AI agents.

查看原文本刊更多论文

DEGAS：全身高斯头像的详细表达

尽管神经渲染技术在创建栩栩如生、可动画化的全身和头部头像方面取得了重大进展，但在全身头像中融入细致表情的技术在很大程度上仍未得到探索。我们展示了首个基于 3D 高斯拼接（3DGS）的建模方法--DEGAS，用于制作面部表情丰富的全身头像。我们的方法在给定对象的多视角视频上进行训练，学习条件变异自动编码器，将身体运动和面部表情作为驱动信号，在 UV 布局中生成高斯图。为了驱动面部表情，我们建议采用仅在二维肖像图像上训练的表情潜空间，而不是三维头像中常用的三维可变形模型（3DMM），从而缩小了二维会说话的人脸和三维头像之间的差距。利用 3DGS 的增强能力和表情潜空间的丰富表现力，学习到的头像可以重现逼真的渲染图像，并带有微妙而准确的面部表情。在现有数据集和我们新提出的全身会说话的头像数据集上进行的实验证明了我们方法的有效性。我们还提出了一种音频驱动的扩展方法，借助二维会说话的人脸，为交互式人工智能代理开辟了新的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Graphics

自引率

0.00%

发文量