{"title":"通过多视图立体和一致性约束的可推广的三维高斯飞溅","authors":"Yongjuan Yang , Jie Cao , Hong Zhao , Weijie Wang","doi":"10.1016/j.neucom.2025.131696","DOIUrl":null,"url":null,"abstract":"<div><div>Recent neural rendering methods still struggle with fine-grained detail reconstruction and scene generalization, especially when handling complex geometries and low-texture regions. To address these challenges, we propose a 3D Gaussian Splatting (3DGS) framework enhanced by Multi-view Stereo (MVS), aiming to improve both rendering quality and cross-scene adaptability. Specifically, we first introduce an Adaptive Perception-aware Feature Aggregation (APFA) module, which effectively fuses 2D image features into 3D geometry-aware representations via a Local Feature Adaptive Collaboration (LFAC) mechanism and a global Attention-Aware Module (AAM), significantly improving reconstruction performance in challenging scenes. Subsequently, we propose a depth and normal supervision strategy based on multi-view geometric consistency, where aggregated point clouds are utilized for optimized initialization, enhancing stability and fine-grained detail fidelity. Finally, a Gaussian geometric consistency regularization module is introduced to further enforce the coherence between depth and normal predictions, leading to more refined rendering results. Extensive experiments on standard benchmarks including DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples demonstrate that our method outperforms state-of-the-art approaches in terms of PSNR, SSIM, and LPIPS metrics. Particularly in real-world complex scenes, our approach achieves superior generalization ability and perceptual quality, validating the effectiveness of the proposed framework. The code for our method will be made available at <span><span>https://github.com/yangyongjuan/MVS-APFA-GS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131696"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generalizable 3D Gaussian splatting via multi-view stereo and consistency constraints\",\"authors\":\"Yongjuan Yang , Jie Cao , Hong Zhao , Weijie Wang\",\"doi\":\"10.1016/j.neucom.2025.131696\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recent neural rendering methods still struggle with fine-grained detail reconstruction and scene generalization, especially when handling complex geometries and low-texture regions. To address these challenges, we propose a 3D Gaussian Splatting (3DGS) framework enhanced by Multi-view Stereo (MVS), aiming to improve both rendering quality and cross-scene adaptability. Specifically, we first introduce an Adaptive Perception-aware Feature Aggregation (APFA) module, which effectively fuses 2D image features into 3D geometry-aware representations via a Local Feature Adaptive Collaboration (LFAC) mechanism and a global Attention-Aware Module (AAM), significantly improving reconstruction performance in challenging scenes. Subsequently, we propose a depth and normal supervision strategy based on multi-view geometric consistency, where aggregated point clouds are utilized for optimized initialization, enhancing stability and fine-grained detail fidelity. Finally, a Gaussian geometric consistency regularization module is introduced to further enforce the coherence between depth and normal predictions, leading to more refined rendering results. Extensive experiments on standard benchmarks including DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples demonstrate that our method outperforms state-of-the-art approaches in terms of PSNR, SSIM, and LPIPS metrics. Particularly in real-world complex scenes, our approach achieves superior generalization ability and perceptual quality, validating the effectiveness of the proposed framework. The code for our method will be made available at <span><span>https://github.com/yangyongjuan/MVS-APFA-GS</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"658 \",\"pages\":\"Article 131696\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225023689\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225023689","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
目前的神经渲染方法仍然难以实现细粒度的细节重建和场景泛化,特别是在处理复杂的几何形状和低纹理区域时。为了解决这些挑战,我们提出了一种基于多视角立体(MVS)增强的3D高斯喷绘(3DGS)框架,旨在提高渲染质量和跨场景适应性。具体而言,我们首先引入了自适应感知特征聚合(APFA)模块,该模块通过局部特征自适应协作(LFAC)机制和全局注意力感知模块(AAM)有效地将2D图像特征融合到3D几何感知表示中,显著提高了具有挑战性场景中的重建性能。随后,我们提出了一种基于多视图几何一致性的深度和常态化监督策略,其中利用聚合点云进行优化初始化,增强稳定性和细粒度细节保真度。最后,引入高斯几何一致性正则化模块,进一步增强深度和正态预测之间的一致性,从而获得更精细的渲染结果。在包括DTU、Real Forward-facing、NeRF Synthetic和Tanks and Temples在内的标准基准测试中进行的大量实验表明,我们的方法在PSNR、SSIM和LPIPS指标方面优于最先进的方法。特别是在现实世界的复杂场景中,我们的方法实现了卓越的泛化能力和感知质量,验证了所提出框架的有效性。我们的方法的代码可以在https://github.com/yangyongjuan/MVS-APFA-GS上获得。
Generalizable 3D Gaussian splatting via multi-view stereo and consistency constraints
Recent neural rendering methods still struggle with fine-grained detail reconstruction and scene generalization, especially when handling complex geometries and low-texture regions. To address these challenges, we propose a 3D Gaussian Splatting (3DGS) framework enhanced by Multi-view Stereo (MVS), aiming to improve both rendering quality and cross-scene adaptability. Specifically, we first introduce an Adaptive Perception-aware Feature Aggregation (APFA) module, which effectively fuses 2D image features into 3D geometry-aware representations via a Local Feature Adaptive Collaboration (LFAC) mechanism and a global Attention-Aware Module (AAM), significantly improving reconstruction performance in challenging scenes. Subsequently, we propose a depth and normal supervision strategy based on multi-view geometric consistency, where aggregated point clouds are utilized for optimized initialization, enhancing stability and fine-grained detail fidelity. Finally, a Gaussian geometric consistency regularization module is introduced to further enforce the coherence between depth and normal predictions, leading to more refined rendering results. Extensive experiments on standard benchmarks including DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples demonstrate that our method outperforms state-of-the-art approaches in terms of PSNR, SSIM, and LPIPS metrics. Particularly in real-world complex scenes, our approach achieves superior generalization ability and perceptual quality, validating the effectiveness of the proposed framework. The code for our method will be made available at https://github.com/yangyongjuan/MVS-APFA-GS.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.