MonoPartNeRF: Human reconstruction from monocular video via part-based neural radiance fields

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk Pub Date : 2025-08-20 DOI:10.1016/j.cag.2025.104385

Yao Lu , Jiawei Li , Ming Jiang

{"title":"MonoPartNeRF: Human reconstruction from monocular video via part-based neural radiance fields","authors":"Yao Lu , Jiawei Li , Ming Jiang","doi":"10.1016/j.cag.2025.104385","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, Neural Radiance Fields (NeRF) have achieved remarkable progress in dynamic human reconstruction and rendering. Part-based rendering paradigms, guided by human segmentation, allow for flexible parameter allocation based on structural complexity, thereby enhancing representational efficiency. However, existing methods still struggle with complex pose variations, often producing unnatural transitions at part boundaries and failing to reconstruct occluded regions accurately in monocular settings. We propose MonoPartNeRF, a novel framework for monocular dynamic human rendering that ensures smooth transitions and robust occlusion recovery. First, we build a bidirectional deformation model that combines rigid and non-rigid transformations to establish a continuous, reversible mapping between observation and canonical spaces. Sampling points are projected into a parameterized surface-time space (u, v, t) to better capture non-rigid motion. A consistency loss further suppresses deformation-induced artifacts and discontinuities. We introduce a part-based pose embedding mechanism that decomposes global pose vectors into local joint embeddings based on body regions. This is combined with keyframe pose retrieval and interpolation, along three orthogonal directions, to guide pose-aware feature sampling. A learnable appearance code is integrated via attention to model dynamic texture changes effectively. Experiments on the ZJU-MoCap and MonoCap datasets demonstrate that our method significantly outperforms prior approaches under complex pose and occlusion conditions, achieving superior joint alignment, texture fidelity, and structural continuity.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"132 ","pages":"Article 104385"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849325002262","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, Neural Radiance Fields (NeRF) have achieved remarkable progress in dynamic human reconstruction and rendering. Part-based rendering paradigms, guided by human segmentation, allow for flexible parameter allocation based on structural complexity, thereby enhancing representational efficiency. However, existing methods still struggle with complex pose variations, often producing unnatural transitions at part boundaries and failing to reconstruct occluded regions accurately in monocular settings. We propose MonoPartNeRF, a novel framework for monocular dynamic human rendering that ensures smooth transitions and robust occlusion recovery. First, we build a bidirectional deformation model that combines rigid and non-rigid transformations to establish a continuous, reversible mapping between observation and canonical spaces. Sampling points are projected into a parameterized surface-time space (u, v, t) to better capture non-rigid motion. A consistency loss further suppresses deformation-induced artifacts and discontinuities. We introduce a part-based pose embedding mechanism that decomposes global pose vectors into local joint embeddings based on body regions. This is combined with keyframe pose retrieval and interpolation, along three orthogonal directions, to guide pose-aware feature sampling. A learnable appearance code is integrated via attention to model dynamic texture changes effectively. Experiments on the ZJU-MoCap and MonoCap datasets demonstrate that our method significantly outperforms prior approaches under complex pose and occlusion conditions, achieving superior joint alignment, texture fidelity, and structural continuity.

Abstract Image

查看原文本刊更多论文

MonoPartNeRF：基于局部神经辐射场的单眼视频人体重建

近年来，神经辐射场（Neural Radiance Fields, NeRF）在动态人体重建和渲染方面取得了显著进展。基于零件的绘制范式，在人分割的指导下，允许基于结构复杂性的灵活参数分配，从而提高了表示效率。然而，现有的方法仍然与复杂的姿态变化作斗争，经常在部分边界产生不自然的过渡，并且无法在单目设置下准确地重建遮挡区域。我们提出了MonoPartNeRF，这是一个用于单目动态人类渲染的新框架，可确保平滑过渡和健壮的遮挡恢复。首先，我们建立了一个结合刚性和非刚性变换的双向变形模型，在观测空间和正则空间之间建立了连续的、可逆的映射。采样点被投影到一个参数化的表面时间空间（u, v, t），以更好地捕捉非刚性运动。一致性损失进一步抑制变形引起的伪影和不连续。引入了一种基于部位的姿态嵌入机制，将全局姿态向量分解为基于身体区域的局部联合嵌入。结合关键帧姿态检索和插值，沿三个正交方向，指导姿态感知特征采样。通过对模型动态纹理变化的关注，有效地集成了可学习的外观代码。在ZJU-MoCap和MonoCap数据集上的实验表明，我们的方法在复杂姿态和遮挡条件下显著优于先前的方法，实现了优越的关节对齐、纹理保真度和结构连续性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Graphics-Uk 工程技术-计算机：软件工程

CiteScore

5.30

自引率

12.00%

发文量

173

审稿时长

38 days

期刊介绍： Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.