Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang
{"title":"DEGAS: Detailed Expressions on Full-Body Gaussian Avatars","authors":"Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang","doi":"arxiv-2408.10588","DOIUrl":null,"url":null,"abstract":"Although neural rendering has made significant advancements in creating\nlifelike, animatable full-body and head avatars, incorporating detailed\nexpressions into full-body avatars remains largely unexplored. We present\nDEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for\nfull-body avatars with rich facial expressions. Trained on multiview videos of\na given subject, our method learns a conditional variational autoencoder that\ntakes both the body motion and facial expression as driving signals to generate\nGaussian maps in the UV layout. To drive the facial expressions, instead of the\ncommonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to\nadopt the expression latent space trained solely on 2D portrait images,\nbridging the gap between 2D talking faces and 3D avatars. Leveraging the\nrendering capability of 3DGS and the rich expressiveness of the expression\nlatent space, the learned avatars can be reenacted to reproduce photorealistic\nrendering images with subtle and accurate facial expressions. Experiments on an\nexisting dataset and our newly proposed dataset of full-body talking avatars\ndemonstrate the efficacy of our method. We also propose an audio-driven\nextension of our method with the help of 2D talking faces, opening new\npossibilities to interactive AI agents.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Although neural rendering has made significant advancements in creating
lifelike, animatable full-body and head avatars, incorporating detailed
expressions into full-body avatars remains largely unexplored. We present
DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for
full-body avatars with rich facial expressions. Trained on multiview videos of
a given subject, our method learns a conditional variational autoencoder that
takes both the body motion and facial expression as driving signals to generate
Gaussian maps in the UV layout. To drive the facial expressions, instead of the
commonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to
adopt the expression latent space trained solely on 2D portrait images,
bridging the gap between 2D talking faces and 3D avatars. Leveraging the
rendering capability of 3DGS and the rich expressiveness of the expression
latent space, the learned avatars can be reenacted to reproduce photorealistic
rendering images with subtle and accurate facial expressions. Experiments on an
existing dataset and our newly proposed dataset of full-body talking avatars
demonstrate the efficacy of our method. We also propose an audio-driven
extension of our method with the help of 2D talking faces, opening new
possibilities to interactive AI agents.