WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction.

IF 6.5

IEEE transactions on visualization and computer graphics Pub Date : 2025-10-06 DOI:10.1109/TVCG.2025.3618268

Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo

{"title":"WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction.","authors":"Zilong Wang, Zhiyang Dou, Yuan Liu, Cheng Lin, Xiao Dong, Yunhui Guo, Chenxu Zhang, Xin Li, Wenping Wang, Xiaohu Guo","doi":"10.1109/TVCG.2025.3618268","DOIUrl":null,"url":null,"abstract":"<p><p>In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Previous dynamic human avatar reconstruction methods typically require the input video to have full coverage of the observed human body. However, in daily practice, one typically has access to limited viewpoints, such as monocular front-view videos, making it a cumbersome task for previous methods to reconstruct the unseen parts of the human avatar. To tackle the issue, we present WonderHuman, which leverages 2D generative diffusion model priors to achieve high-quality, photorealistic reconstructions of dynamic human avatars from monocular videos, including accurate rendering of unseen body parts. Our approach introduces a Dual-Space Optimization technique, applying Score Distillation Sampling (SDS) in both canonical and observation spaces to ensure visual consistency and enhance realism in dynamic human reconstruction. Additionally, we present a View Selection strategy and Pose Feature Injection to enforce the consistency between SDS predictions and observed data, ensuring pose-dependent effects and higher fidelity in the reconstructed avatar. In the experiments, our method achieves SOTA performance in producing photorealistic renderings from the given monocular video, particularly for those challenging unseen parts. The project page and source code can be found at https://wyiguanw.github.io/WonderHuman/.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3618268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Previous dynamic human avatar reconstruction methods typically require the input video to have full coverage of the observed human body. However, in daily practice, one typically has access to limited viewpoints, such as monocular front-view videos, making it a cumbersome task for previous methods to reconstruct the unseen parts of the human avatar. To tackle the issue, we present WonderHuman, which leverages 2D generative diffusion model priors to achieve high-quality, photorealistic reconstructions of dynamic human avatars from monocular videos, including accurate rendering of unseen body parts. Our approach introduces a Dual-Space Optimization technique, applying Score Distillation Sampling (SDS) in both canonical and observation spaces to ensure visual consistency and enhance realism in dynamic human reconstruction. Additionally, we present a View Selection strategy and Pose Feature Injection to enforce the consistency between SDS predictions and observed data, ensuring pose-dependent effects and higher fidelity in the reconstructed avatar. In the experiments, our method achieves SOTA performance in producing photorealistic renderings from the given monocular video, particularly for those challenging unseen parts. The project page and source code can be found at https://wyiguanw.github.io/WonderHuman/.

查看原文本刊更多论文

神奇的人：幻觉看不见的部分在动态3D人体重建。

在本文中，我们提出了WonderHuman从单眼视频中重建动态人物头像，用于高保真新颖视图合成。以往的动态人体化身重建方法通常要求输入视频对观察到的人体有完整的覆盖。然而，在日常实践中，人们通常只能访问有限的视点，例如单目前视视频，这使得以前的方法重建人类化身的未见部分成为一项繁琐的任务。为了解决这个问题，我们提出了WonderHuman，它利用2D生成扩散模型先验来实现单目视频中动态人类头像的高质量，逼真的重建，包括对看不见的身体部位的准确渲染。我们的方法引入了一种双空间优化技术，在规范空间和观察空间中应用分数蒸馏采样（SDS），以确保视觉一致性并增强动态人体重建的真实感。此外，我们提出了一种视图选择策略和姿态特征注入，以加强SDS预测和观测数据之间的一致性，确保重构头像的姿态依赖效果和更高的保真度。在实验中，我们的方法在从给定的单目视频生成逼真的渲染时达到了SOTA性能，特别是对于那些具有挑战性的看不见的部分。项目页面和源代码可以在https://wyiguanw.github.io/WonderHuman/上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量