Look at the Sky: Sky-aware Efficient 3D Gaussian Splatting in the Wild.

Yuze Wang, Junyi Wang, Ruicheng Gao, Yansong Qu, Wantong Duan, Shuo Yang, Yue Qi
{"title":"Look at the Sky: Sky-aware Efficient 3D Gaussian Splatting in the Wild.","authors":"Yuze Wang, Junyi Wang, Ruicheng Gao, Yansong Qu, Wantong Duan, Shuo Yang, Yue Qi","doi":"10.1109/TVCG.2025.3549187","DOIUrl":null,"url":null,"abstract":"<p><p>Photos taken in unconstrained tourist environments often present challenges for accurate 3D scene reconstruction due to variable appearances and transient occlusions, which can introduce artifacts in novel view synthesis. Recently, in-the-wild 3D scene reconstruction has been achieved realistic rendering with Neural Radiance Fields (NeRFs). With the advancement of 3D Gaussian Splatting (3DGS), some methods also attempt to reconstruct 3D scenes from unconstrained photo collections and achieve real-time rendering. However, the rapid convergence of 3DGS is misaligned with the slower convergence of neural network-based appearance encoder and transient mask predictor, hindering the reconstruction efficiency. To address this, we propose a novel sky-aware framework for scene reconstruction from unconstrained photo collection using 3DGS. Firstly, we observe that the learnable per-image transient mask predictor in previous work is unnecessary. By introducing a simple yet efficient greedy supervision strategy, we directly utilize the pseudo mask generated by a pre-trained semantic segmentation network as the transient mask, thereby achieving more efficient and higher quality in-the-wild 3D scene reconstruction. Secondly, we find that separately estimating appearance embeddings for the sky and building significantly improves reconstruction efficiency and accuracy. We analyze the underlying reasons and introduce a neural sky module to generate diverse skies from latent sky embeddings extract from unconstrained images. Finally, we propose a mutual distillation learning strategy to constrain sky and building appearance embeddings within the same latent space, further enhancing reconstruction efficiency and quality. Extensive experiments on multiple datasets demonstrate that the proposed framework outperforms existing methods in novel view and appearance synthesis, offering superior rendering quality with faster convergence and rendering speed.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3549187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Photos taken in unconstrained tourist environments often present challenges for accurate 3D scene reconstruction due to variable appearances and transient occlusions, which can introduce artifacts in novel view synthesis. Recently, in-the-wild 3D scene reconstruction has been achieved realistic rendering with Neural Radiance Fields (NeRFs). With the advancement of 3D Gaussian Splatting (3DGS), some methods also attempt to reconstruct 3D scenes from unconstrained photo collections and achieve real-time rendering. However, the rapid convergence of 3DGS is misaligned with the slower convergence of neural network-based appearance encoder and transient mask predictor, hindering the reconstruction efficiency. To address this, we propose a novel sky-aware framework for scene reconstruction from unconstrained photo collection using 3DGS. Firstly, we observe that the learnable per-image transient mask predictor in previous work is unnecessary. By introducing a simple yet efficient greedy supervision strategy, we directly utilize the pseudo mask generated by a pre-trained semantic segmentation network as the transient mask, thereby achieving more efficient and higher quality in-the-wild 3D scene reconstruction. Secondly, we find that separately estimating appearance embeddings for the sky and building significantly improves reconstruction efficiency and accuracy. We analyze the underlying reasons and introduce a neural sky module to generate diverse skies from latent sky embeddings extract from unconstrained images. Finally, we propose a mutual distillation learning strategy to constrain sky and building appearance embeddings within the same latent space, further enhancing reconstruction efficiency and quality. Extensive experiments on multiple datasets demonstrate that the proposed framework outperforms existing methods in novel view and appearance synthesis, offering superior rendering quality with faster convergence and rendering speed.

求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信