AvatarWild: Fully controllable head avatars in the wild

IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Shaoxu Meng , Tong Wu , Fang-Lue Zhang , Shu-Yu Chen , Yuewen Ma , Wenbo Hu , Lin Gao
{"title":"AvatarWild: Fully controllable head avatars in the wild","authors":"Shaoxu Meng ,&nbsp;Tong Wu ,&nbsp;Fang-Lue Zhang ,&nbsp;Shu-Yu Chen ,&nbsp;Yuewen Ma ,&nbsp;Wenbo Hu ,&nbsp;Lin Gao","doi":"10.1016/j.visinf.2024.09.001","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields (NeRF). Despite these advances, capturing intricate facial details remains a persistent challenge. Moreover, casually captured input, involving both head poses and camera movements, introduces additional difficulties to existing methods of head avatar reconstruction. To address the challenge posed by video data captured with camera motion, we propose a novel method, AvatarWild, for reconstructing head avatars from monocular videos taken by consumer devices. Notably, our approach decouples the camera pose and head pose, allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints. To enhance the visual quality of the reconstructed facial avatar, we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency. Our method demonstrates superior performance compared to existing approaches, as evidenced by reconstruction and animation results on both multi-view and single-view datasets. Remarkably, our approach stands out by exclusively relying on video data captured by portable devices, such as smartphones. This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"8 3","pages":"Pages 96-106"},"PeriodicalIF":3.8000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Informatics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468502X24000421","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields (NeRF). Despite these advances, capturing intricate facial details remains a persistent challenge. Moreover, casually captured input, involving both head poses and camera movements, introduces additional difficulties to existing methods of head avatar reconstruction. To address the challenge posed by video data captured with camera motion, we propose a novel method, AvatarWild, for reconstructing head avatars from monocular videos taken by consumer devices. Notably, our approach decouples the camera pose and head pose, allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints. To enhance the visual quality of the reconstructed facial avatar, we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency. Our method demonstrates superior performance compared to existing approaches, as evidenced by reconstruction and animation results on both multi-view and single-view datasets. Remarkably, our approach stands out by exclusively relying on video data captured by portable devices, such as smartphones. This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.
AvatarWild:完全可控的野生头像
该领域的最新进展是利用神经辐射场(NeRF)实现逼真的头部重建和操作。尽管取得了这些进展,但捕捉复杂的面部细节仍是一项长期挑战。此外,随手捕捉的输入信息涉及头部姿势和摄像机运动,给现有的头像重建方法带来了更多困难。为了应对相机运动视频数据带来的挑战,我们提出了一种新方法 AvatarWild,用于从消费类设备拍摄的单目视频中重建头像。值得注意的是,我们的方法将摄像机姿势和头部姿势分离开来,允许从新的视角以不同的姿势和表情可视化重建的头像。为了提高重建后的面部头像的视觉质量,我们引入了一个视图相关细节增强模块,旨在增强局部面部细节而不影响视点一致性。我们的方法在多视角和单视角数据集上的重建和动画结果表明,与现有方法相比,我们的方法具有更优越的性能。值得注意的是,我们的方法完全依赖于智能手机等便携设备捕获的视频数据,因此脱颖而出。这不仅强调了我们方法的实用性,还将其适用范围扩展到了对数据获取的可及性和便捷性至关重要的现实世界场景中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Visual Informatics
Visual Informatics Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
6.70
自引率
3.30%
发文量
33
审稿时长
79 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信