FRPGS：快速，鲁棒，逼真的单目动态场景重建与可变形的三维高斯

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-04-02 DOI:10.1109/TCSVT.2025.3557012

Wan Li;Xiao Pan;Jiaxin Lin;Ping Lu;Daquan Feng;Wenzhe Shi

{"title":"FRPGS：快速，鲁棒，逼真的单目动态场景重建与可变形的三维高斯","authors":"Wan Li;Xiao Pan;Jiaxin Lin;Ping Lu;Daquan Feng;Wenzhe Shi","doi":"10.1109/TCSVT.2025.3557012","DOIUrl":null,"url":null,"abstract":"Dynamic reconstruction technology presents significant promise for applications in visual and interactive fields. Current techniques utilizing 3D Gaussian Splatting show favorable results and fast reconstruction speed. However, as scene expanding, using individual Gaussian structure 1) leads to instability in large-scale dynamic reconstruction, marked by abrupt deformation, and 2) the heuristic densification of individuals suffers significant redundancy. Tackling these issues, we propose a jointed Gaussian representation method named FRPGS, which learns the global information and the deformation using center Gaussians and generates the neural Gaussians around them for local detail. Specifically, FRPGS employs center Gaussians initialized from point clouds, which are learned with a deformation field for representing global relationships and dynamic motion over time. Then, for each center Gaussian, attribute networks generate neural Gaussians that move under the linked center Gaussian driving, thereby ensuring structural integrity during movement within this joint-based representation. Finally, to reduce Gaussian redundancy, a densification strategy is developed based on the average cumulative gradient of the associated neural Gaussians, imposing strict limits on the growing of center Gaussians without compromising accuracy. Additionally, we established a large-scale dynamic indoor dataset at the MuLong Laboratory of ZTE Corporation. Evaluations demonstrate that FRPGS significantly outperforms state-of-the-art methods in both training efficiency and reconstruction quality, achieving over a 50% (up to 74%) improvement in efficiency on an RTX 4090. FRPGS also supports the 4K resolution reconstruction of 60 frames simultaneously.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9119-9131"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FRPGS: Fast, Robust, and Photorealistic Monocular Dynamic Scene Reconstruction With Deformable 3D Gaussians\",\"authors\":\"Wan Li;Xiao Pan;Jiaxin Lin;Ping Lu;Daquan Feng;Wenzhe Shi\",\"doi\":\"10.1109/TCSVT.2025.3557012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic reconstruction technology presents significant promise for applications in visual and interactive fields. Current techniques utilizing 3D Gaussian Splatting show favorable results and fast reconstruction speed. However, as scene expanding, using individual Gaussian structure 1) leads to instability in large-scale dynamic reconstruction, marked by abrupt deformation, and 2) the heuristic densification of individuals suffers significant redundancy. Tackling these issues, we propose a jointed Gaussian representation method named FRPGS, which learns the global information and the deformation using center Gaussians and generates the neural Gaussians around them for local detail. Specifically, FRPGS employs center Gaussians initialized from point clouds, which are learned with a deformation field for representing global relationships and dynamic motion over time. Then, for each center Gaussian, attribute networks generate neural Gaussians that move under the linked center Gaussian driving, thereby ensuring structural integrity during movement within this joint-based representation. Finally, to reduce Gaussian redundancy, a densification strategy is developed based on the average cumulative gradient of the associated neural Gaussians, imposing strict limits on the growing of center Gaussians without compromising accuracy. Additionally, we established a large-scale dynamic indoor dataset at the MuLong Laboratory of ZTE Corporation. Evaluations demonstrate that FRPGS significantly outperforms state-of-the-art methods in both training efficiency and reconstruction quality, achieving over a 50% (up to 74%) improvement in efficiency on an RTX 4090. FRPGS also supports the 4K resolution reconstruction of 60 frames simultaneously.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9119-9131\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10947553/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10947553/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

动态重构技术在视觉和交互领域具有广阔的应用前景。目前使用的三维高斯溅射技术具有良好的效果和快速的重建速度。然而，随着场景的扩展，使用个体高斯结构会导致大规模动态重建的不稳定性，表现为突然变形，并且个体的启发式密度存在明显的冗余。针对这些问题，我们提出了一种名为FRPGS的联合高斯表示方法，该方法使用中心高斯函数学习全局信息和变形，并在它们周围生成局部细节的神经高斯函数。具体来说，FRPGS采用从点云初始化的中心高斯，这些中心高斯是通过变形场来学习的，用于表示全局关系和随时间的动态运动。然后，对于每个中心高斯，属性网络生成在链接中心高斯驱动下运动的神经高斯，从而确保在基于关节的表示中运动时的结构完整性。最后，为了减少高斯冗余，基于相关神经高斯分布的平均累积梯度开发了一种致密化策略，在不影响精度的情况下严格限制中心高斯分布的增长。此外，我们还在中兴通讯木龙实验室建立了大规模的室内动态数据集。评估表明，FRPGS在训练效率和重建质量方面都明显优于最先进的方法，在RTX 4090上实现了超过50%（高达74%）的效率提高。FRPGS同时支持60帧的4K分辨率重建。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FRPGS: Fast, Robust, and Photorealistic Monocular Dynamic Scene Reconstruction With Deformable 3D Gaussians

Dynamic reconstruction technology presents significant promise for applications in visual and interactive fields. Current techniques utilizing 3D Gaussian Splatting show favorable results and fast reconstruction speed. However, as scene expanding, using individual Gaussian structure 1) leads to instability in large-scale dynamic reconstruction, marked by abrupt deformation, and 2) the heuristic densification of individuals suffers significant redundancy. Tackling these issues, we propose a jointed Gaussian representation method named FRPGS, which learns the global information and the deformation using center Gaussians and generates the neural Gaussians around them for local detail. Specifically, FRPGS employs center Gaussians initialized from point clouds, which are learned with a deformation field for representing global relationships and dynamic motion over time. Then, for each center Gaussian, attribute networks generate neural Gaussians that move under the linked center Gaussian driving, thereby ensuring structural integrity during movement within this joint-based representation. Finally, to reduce Gaussian redundancy, a densification strategy is developed based on the average cumulative gradient of the associated neural Gaussians, imposing strict limits on the growing of center Gaussians without compromising accuracy. Additionally, we established a large-scale dynamic indoor dataset at the MuLong Laboratory of ZTE Corporation. Evaluations demonstrate that FRPGS significantly outperforms state-of-the-art methods in both training efficiency and reconstruction quality, achieving over a 50% (up to 74%) improvement in efficiency on an RTX 4090. FRPGS also supports the 4K resolution reconstruction of 60 frames simultaneously.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.