通过分解神经场表示可动画的化身

IF 2.9 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum Pub Date : 2025-08-28 DOI:10.1111/cgf.70192

Chunjin Song, Zhijie Wu, Bastian Wandt, Leonid Sigal, Helge Rhodin

{"title":"通过分解神经场表示可动画的化身","authors":"Chunjin Song, Zhijie Wu, Bastian Wandt, Leonid Sigal, Helge Rhodin","doi":"10.1111/cgf.70192","DOIUrl":null,"url":null,"abstract":"<p>For reconstructing high-fidelity human 3D models from monocular videos, it is crucial to maintain consistent large-scale body shapes along with finely matched subtle wrinkles. This paper explores how per-frame rendering results can be factorized into a pose-independent component and a corresponding pose-dependent counterpart to facilitate frame consistency at multiple scales. Pose adaptive texture features are further improved by restricting the frequency bands of these two components. Pose-independent outputs are expected to be low-frequency, while high-frequency information is linked to pose-dependent factors. We implement this with a dual-branch network. The first branch takes coordinates in the canonical space as input, while the second one additionally considers features outputted by the first branch and pose information of each frame. A final network integrates the information predicted by both branches and utilizes volume rendering to generate photo-realistic 3D human images. Through experiments, we demonstrate that our method consistently surpasses all state-of-the-art methods in preserving high-frequency details and ensuring consistent body contours. Our code is accessible at https://github.com/ChunjinSong/facavatar.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"44 5","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.70192","citationCount":"0","resultStr":"{\"title\":\"Representing Animatable Avatar via Factorized Neural Fields\",\"authors\":\"Chunjin Song, Zhijie Wu, Bastian Wandt, Leonid Sigal, Helge Rhodin\",\"doi\":\"10.1111/cgf.70192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>For reconstructing high-fidelity human 3D models from monocular videos, it is crucial to maintain consistent large-scale body shapes along with finely matched subtle wrinkles. This paper explores how per-frame rendering results can be factorized into a pose-independent component and a corresponding pose-dependent counterpart to facilitate frame consistency at multiple scales. Pose adaptive texture features are further improved by restricting the frequency bands of these two components. Pose-independent outputs are expected to be low-frequency, while high-frequency information is linked to pose-dependent factors. We implement this with a dual-branch network. The first branch takes coordinates in the canonical space as input, while the second one additionally considers features outputted by the first branch and pose information of each frame. A final network integrates the information predicted by both branches and utilizes volume rendering to generate photo-realistic 3D human images. Through experiments, we demonstrate that our method consistently surpasses all state-of-the-art methods in preserving high-frequency details and ensuring consistent body contours. Our code is accessible at https://github.com/ChunjinSong/facavatar.</p>\",\"PeriodicalId\":10687,\"journal\":{\"name\":\"Computer Graphics Forum\",\"volume\":\"44 5\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.70192\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Graphics Forum\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/cgf.70192\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Graphics Forum","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cgf.70192","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

为了从单目视频中重建高保真的人体3D模型，保持一致的大规模身体形状以及精细匹配的细微皱纹至关重要。本文探讨了如何将每帧渲染结果分解为与姿态无关的组件和相应的与姿态相关的对应组件，以促进多尺度下的帧一致性。通过限制这两个分量的频带，进一步改进姿态自适应纹理特征。与姿态无关的输出预计是低频的，而高频信息与姿态相关的因素有关。我们通过双分支网络实现这一点。第一个分支以规范空间中的坐标作为输入，第二个分支额外考虑第一个分支输出的特征和每帧的位姿信息。最终的网络集成了两个分支预测的信息，并利用体绘制来生成逼真的3D人体图像。通过实验，我们证明我们的方法在保留高频细节和确保一致的身体轮廓方面始终优于所有最先进的方法。我们的代码可以在https://github.com/ChunjinSong/facavatar上访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Representing Animatable Avatar via Factorized Neural Fields

查看原文本刊更多论文

Representing Animatable Avatar via Factorized Neural Fields

For reconstructing high-fidelity human 3D models from monocular videos, it is crucial to maintain consistent large-scale body shapes along with finely matched subtle wrinkles. This paper explores how per-frame rendering results can be factorized into a pose-independent component and a corresponding pose-dependent counterpart to facilitate frame consistency at multiple scales. Pose adaptive texture features are further improved by restricting the frequency bands of these two components. Pose-independent outputs are expected to be low-frequency, while high-frequency information is linked to pose-dependent factors. We implement this with a dual-branch network. The first branch takes coordinates in the canonical space as input, while the second one additionally considers features outputted by the first branch and pose information of each frame. A final network integrates the information predicted by both branches and utilizes volume rendering to generate photo-realistic 3D human images. Through experiments, we demonstrate that our method consistently surpasses all state-of-the-art methods in preserving high-frequency details and ensuring consistent body contours. Our code is accessible at https://github.com/ChunjinSong/facavatar.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Graphics Forum 工程技术-计算机：软件工程

CiteScore

5.80

自引率

12.00%

发文量

175

审稿时长

3-6 weeks

期刊介绍： Computer Graphics Forum is the official journal of Eurographics, published in cooperation with Wiley-Blackwell, and is a unique, international source of information for computer graphics professionals interested in graphics developments worldwide. It is now one of the leading journals for researchers, developers and users of computer graphics in both commercial and academic environments. The journal reports on the latest developments in the field throughout the world and covers all aspects of the theory, practice and application of computer graphics.