AvatarWild: Fully controllable head avatars in the wild

IF 3.8 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Visual Informatics Pub Date : 2024-09-01 DOI:10.1016/j.visinf.2024.09.001

Shaoxu Meng , Tong Wu , Fang-Lue Zhang , Shu-Yu Chen , Yuewen Ma , Wenbo Hu , Lin Gao

{"title":"AvatarWild: Fully controllable head avatars in the wild","authors":"Shaoxu Meng , Tong Wu , Fang-Lue Zhang , Shu-Yu Chen , Yuewen Ma , Wenbo Hu , Lin Gao","doi":"10.1016/j.visinf.2024.09.001","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields (NeRF). Despite these advances, capturing intricate facial details remains a persistent challenge. Moreover, casually captured input, involving both head poses and camera movements, introduces additional difficulties to existing methods of head avatar reconstruction. To address the challenge posed by video data captured with camera motion, we propose a novel method, AvatarWild, for reconstructing head avatars from monocular videos taken by consumer devices. Notably, our approach decouples the camera pose and head pose, allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints. To enhance the visual quality of the reconstructed facial avatar, we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency. Our method demonstrates superior performance compared to existing approaches, as evidenced by reconstruction and animation results on both multi-view and single-view datasets. Remarkably, our approach stands out by exclusively relying on video data captured by portable devices, such as smartphones. This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.</div></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"8 3","pages":"Pages 96-106"},"PeriodicalIF":3.8000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Informatics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468502X24000421","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields (NeRF). Despite these advances, capturing intricate facial details remains a persistent challenge. Moreover, casually captured input, involving both head poses and camera movements, introduces additional difficulties to existing methods of head avatar reconstruction. To address the challenge posed by video data captured with camera motion, we propose a novel method, AvatarWild, for reconstructing head avatars from monocular videos taken by consumer devices. Notably, our approach decouples the camera pose and head pose, allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints. To enhance the visual quality of the reconstructed facial avatar, we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency. Our method demonstrates superior performance compared to existing approaches, as evidenced by reconstruction and animation results on both multi-view and single-view datasets. Remarkably, our approach stands out by exclusively relying on video data captured by portable devices, such as smartphones. This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.

查看原文本刊更多论文

AvatarWild：完全可控的野生头像

该领域的最新进展是利用神经辐射场（NeRF）实现逼真的头部重建和操作。尽管取得了这些进展，但捕捉复杂的面部细节仍是一项长期挑战。此外，随手捕捉的输入信息涉及头部姿势和摄像机运动，给现有的头像重建方法带来了更多困难。为了应对相机运动视频数据带来的挑战，我们提出了一种新方法 AvatarWild，用于从消费类设备拍摄的单目视频中重建头像。值得注意的是，我们的方法将摄像机姿势和头部姿势分离开来，允许从新的视角以不同的姿势和表情可视化重建的头像。为了提高重建后的面部头像的视觉质量，我们引入了一个视图相关细节增强模块，旨在增强局部面部细节而不影响视点一致性。我们的方法在多视角和单视角数据集上的重建和动画结果表明，与现有方法相比，我们的方法具有更优越的性能。值得注意的是，我们的方法完全依赖于智能手机等便携设备捕获的视频数据，因此脱颖而出。这不仅强调了我们方法的实用性，还将其适用范围扩展到了对数据获取的可及性和便捷性至关重要的现实世界场景中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊