DEGAS:全身高斯头像的详细表达

Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang
{"title":"DEGAS:全身高斯头像的详细表达","authors":"Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang","doi":"arxiv-2408.10588","DOIUrl":null,"url":null,"abstract":"Although neural rendering has made significant advancements in creating\nlifelike, animatable full-body and head avatars, incorporating detailed\nexpressions into full-body avatars remains largely unexplored. We present\nDEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for\nfull-body avatars with rich facial expressions. Trained on multiview videos of\na given subject, our method learns a conditional variational autoencoder that\ntakes both the body motion and facial expression as driving signals to generate\nGaussian maps in the UV layout. To drive the facial expressions, instead of the\ncommonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to\nadopt the expression latent space trained solely on 2D portrait images,\nbridging the gap between 2D talking faces and 3D avatars. Leveraging the\nrendering capability of 3DGS and the rich expressiveness of the expression\nlatent space, the learned avatars can be reenacted to reproduce photorealistic\nrendering images with subtle and accurate facial expressions. Experiments on an\nexisting dataset and our newly proposed dataset of full-body talking avatars\ndemonstrate the efficacy of our method. We also propose an audio-driven\nextension of our method with the help of 2D talking faces, opening new\npossibilities to interactive AI agents.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DEGAS: Detailed Expressions on Full-Body Gaussian Avatars\",\"authors\":\"Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang\",\"doi\":\"arxiv-2408.10588\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although neural rendering has made significant advancements in creating\\nlifelike, animatable full-body and head avatars, incorporating detailed\\nexpressions into full-body avatars remains largely unexplored. We present\\nDEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for\\nfull-body avatars with rich facial expressions. Trained on multiview videos of\\na given subject, our method learns a conditional variational autoencoder that\\ntakes both the body motion and facial expression as driving signals to generate\\nGaussian maps in the UV layout. To drive the facial expressions, instead of the\\ncommonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to\\nadopt the expression latent space trained solely on 2D portrait images,\\nbridging the gap between 2D talking faces and 3D avatars. Leveraging the\\nrendering capability of 3DGS and the rich expressiveness of the expression\\nlatent space, the learned avatars can be reenacted to reproduce photorealistic\\nrendering images with subtle and accurate facial expressions. Experiments on an\\nexisting dataset and our newly proposed dataset of full-body talking avatars\\ndemonstrate the efficacy of our method. We also propose an audio-driven\\nextension of our method with the help of 2D talking faces, opening new\\npossibilities to interactive AI agents.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"40 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.10588\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

尽管神经渲染技术在创建栩栩如生、可动画化的全身和头部头像方面取得了重大进展,但在全身头像中融入细致表情的技术在很大程度上仍未得到探索。我们展示了首个基于 3D 高斯拼接(3DGS)的建模方法--DEGAS,用于制作面部表情丰富的全身头像。我们的方法在给定对象的多视角视频上进行训练,学习条件变异自动编码器,将身体运动和面部表情作为驱动信号,在 UV 布局中生成高斯图。为了驱动面部表情,我们建议采用仅在二维肖像图像上训练的表情潜空间,而不是三维头像中常用的三维可变形模型(3DMM),从而缩小了二维会说话的人脸和三维头像之间的差距。利用 3DGS 的增强能力和表情潜空间的丰富表现力,学习到的头像可以重现逼真的渲染图像,并带有微妙而准确的面部表情。在现有数据集和我们新提出的全身会说话的头像数据集上进行的实验证明了我们方法的有效性。我们还提出了一种音频驱动的扩展方法,借助二维会说话的人脸,为交互式人工智能代理开辟了新的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DEGAS: Detailed Expressions on Full-Body Gaussian Avatars
Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method learns a conditional variational autoencoder that takes both the body motion and facial expression as driving signals to generate Gaussian maps in the UV layout. To drive the facial expressions, instead of the commonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to adopt the expression latent space trained solely on 2D portrait images, bridging the gap between 2D talking faces and 3D avatars. Leveraging the rendering capability of 3DGS and the rich expressiveness of the expression latent space, the learned avatars can be reenacted to reproduce photorealistic rendering images with subtle and accurate facial expressions. Experiments on an existing dataset and our newly proposed dataset of full-body talking avatars demonstrate the efficacy of our method. We also propose an audio-driven extension of our method with the help of 2D talking faces, opening new possibilities to interactive AI agents.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信