Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang
{"title":"DEGAS:全身高斯头像的详细表达","authors":"Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang","doi":"arxiv-2408.10588","DOIUrl":null,"url":null,"abstract":"Although neural rendering has made significant advancements in creating\nlifelike, animatable full-body and head avatars, incorporating detailed\nexpressions into full-body avatars remains largely unexplored. We present\nDEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for\nfull-body avatars with rich facial expressions. Trained on multiview videos of\na given subject, our method learns a conditional variational autoencoder that\ntakes both the body motion and facial expression as driving signals to generate\nGaussian maps in the UV layout. To drive the facial expressions, instead of the\ncommonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to\nadopt the expression latent space trained solely on 2D portrait images,\nbridging the gap between 2D talking faces and 3D avatars. Leveraging the\nrendering capability of 3DGS and the rich expressiveness of the expression\nlatent space, the learned avatars can be reenacted to reproduce photorealistic\nrendering images with subtle and accurate facial expressions. Experiments on an\nexisting dataset and our newly proposed dataset of full-body talking avatars\ndemonstrate the efficacy of our method. We also propose an audio-driven\nextension of our method with the help of 2D talking faces, opening new\npossibilities to interactive AI agents.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DEGAS: Detailed Expressions on Full-Body Gaussian Avatars\",\"authors\":\"Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang\",\"doi\":\"arxiv-2408.10588\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although neural rendering has made significant advancements in creating\\nlifelike, animatable full-body and head avatars, incorporating detailed\\nexpressions into full-body avatars remains largely unexplored. We present\\nDEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for\\nfull-body avatars with rich facial expressions. Trained on multiview videos of\\na given subject, our method learns a conditional variational autoencoder that\\ntakes both the body motion and facial expression as driving signals to generate\\nGaussian maps in the UV layout. To drive the facial expressions, instead of the\\ncommonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to\\nadopt the expression latent space trained solely on 2D portrait images,\\nbridging the gap between 2D talking faces and 3D avatars. Leveraging the\\nrendering capability of 3DGS and the rich expressiveness of the expression\\nlatent space, the learned avatars can be reenacted to reproduce photorealistic\\nrendering images with subtle and accurate facial expressions. Experiments on an\\nexisting dataset and our newly proposed dataset of full-body talking avatars\\ndemonstrate the efficacy of our method. We also propose an audio-driven\\nextension of our method with the help of 2D talking faces, opening new\\npossibilities to interactive AI agents.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"40 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.10588\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
尽管神经渲染技术在创建栩栩如生、可动画化的全身和头部头像方面取得了重大进展,但在全身头像中融入细致表情的技术在很大程度上仍未得到探索。我们展示了首个基于 3D 高斯拼接(3DGS)的建模方法--DEGAS,用于制作面部表情丰富的全身头像。我们的方法在给定对象的多视角视频上进行训练,学习条件变异自动编码器,将身体运动和面部表情作为驱动信号,在 UV 布局中生成高斯图。为了驱动面部表情,我们建议采用仅在二维肖像图像上训练的表情潜空间,而不是三维头像中常用的三维可变形模型(3DMM),从而缩小了二维会说话的人脸和三维头像之间的差距。利用 3DGS 的增强能力和表情潜空间的丰富表现力,学习到的头像可以重现逼真的渲染图像,并带有微妙而准确的面部表情。在现有数据集和我们新提出的全身会说话的头像数据集上进行的实验证明了我们方法的有效性。我们还提出了一种音频驱动的扩展方法,借助二维会说话的人脸,为交互式人工智能代理开辟了新的可能性。
DEGAS: Detailed Expressions on Full-Body Gaussian Avatars
Although neural rendering has made significant advancements in creating
lifelike, animatable full-body and head avatars, incorporating detailed
expressions into full-body avatars remains largely unexplored. We present
DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for
full-body avatars with rich facial expressions. Trained on multiview videos of
a given subject, our method learns a conditional variational autoencoder that
takes both the body motion and facial expression as driving signals to generate
Gaussian maps in the UV layout. To drive the facial expressions, instead of the
commonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to
adopt the expression latent space trained solely on 2D portrait images,
bridging the gap between 2D talking faces and 3D avatars. Leveraging the
rendering capability of 3DGS and the rich expressiveness of the expression
latent space, the learned avatars can be reenacted to reproduce photorealistic
rendering images with subtle and accurate facial expressions. Experiments on an
existing dataset and our newly proposed dataset of full-body talking avatars
demonstrate the efficacy of our method. We also propose an audio-driven
extension of our method with the help of 2D talking faces, opening new
possibilities to interactive AI agents.