TSGaussian: Semantic and depth-guided Target-Specific Gaussian Splatting from sparse views

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-08-16 DOI:10.1016/j.imavis.2025.105706

Liang Zhao , Zehan Bao , Yi Xie , Hong Chen , Yaohui Chen , Weifu Li

{"title":"TSGaussian: Semantic and depth-guided Target-Specific Gaussian Splatting from sparse views","authors":"Liang Zhao , Zehan Bao , Yi Xie , Hong Chen , Yaohui Chen , Weifu Li","doi":"10.1016/j.imavis.2025.105706","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in Gaussian Splatting have significantly advanced the field, achieving both panoptic and interactive segmentation of 3D scenes. However, existing methodologies often overlook the critical need for reconstructing specified targets with complex structures from sparse views. To address this issue, we introduce TSGaussian, a framework that combines semantic constraints with depth priors to avoid geometry degradation in challenging novel view synthesis tasks. Our approach prioritizes computational resources on designated targets while minimizing background allocation. Bounding boxes from YOLOv9 serve as prompts for Segment Anything Model to generate 2D mask predictions, ensuring semantic accuracy and cost efficiency. TSGaussian effectively clusters 3D gaussians by introducing a compact identity encoding for each Gaussian ellipsoid and incorporating 3D spatial consistency regularization. Leveraging these modules, we propose a pruning strategy to effectively reduce redundancy in 3D gaussians. Extensive experiments demonstrate that TSGaussian outperforms state-of-the-art methods on three standard datasets and a self-built dataset, achieving superior results in novel view synthesis of specific objects. Code is available at: <span><span>https://github.com/leon2000-ai/TSGaussian</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105706"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S026288562500294X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in Gaussian Splatting have significantly advanced the field, achieving both panoptic and interactive segmentation of 3D scenes. However, existing methodologies often overlook the critical need for reconstructing specified targets with complex structures from sparse views. To address this issue, we introduce TSGaussian, a framework that combines semantic constraints with depth priors to avoid geometry degradation in challenging novel view synthesis tasks. Our approach prioritizes computational resources on designated targets while minimizing background allocation. Bounding boxes from YOLOv9 serve as prompts for Segment Anything Model to generate 2D mask predictions, ensuring semantic accuracy and cost efficiency. TSGaussian effectively clusters 3D gaussians by introducing a compact identity encoding for each Gaussian ellipsoid and incorporating 3D spatial consistency regularization. Leveraging these modules, we propose a pruning strategy to effectively reduce redundancy in 3D gaussians. Extensive experiments demonstrate that TSGaussian outperforms state-of-the-art methods on three standard datasets and a self-built dataset, achieving superior results in novel view synthesis of specific objects. Code is available at: https://github.com/leon2000-ai/TSGaussian.

Abstract Image

查看原文本刊更多论文

TSGaussian：基于稀疏视图的语义和深度引导的目标特定高斯溅射

高斯飞溅的最新进展显著推进了该领域，实现了3D场景的全景和交互式分割。然而，现有的方法往往忽略了从稀疏视图重构具有复杂结构的特定目标的关键需求。为了解决这个问题，我们引入了TSGaussian，这是一个将语义约束与深度先验相结合的框架，以避免在具有挑战性的新视图合成任务中出现几何退化。我们的方法优先考虑指定目标的计算资源，同时最小化后台分配。来自YOLOv9的边界框作为分割任何模型的提示，以生成2D掩码预测，确保语义准确性和成本效率。TSGaussian通过对每个高斯椭球引入紧凑的恒等编码并结合三维空间一致性正则化，有效地聚类了三维高斯。利用这些模块，我们提出了一种修剪策略来有效地减少三维高斯的冗余。大量实验表明，TSGaussian算法在三个标准数据集和一个自建数据集上优于最先进的方法，在特定对象的新视图合成方面取得了优异的结果。代码可从https://github.com/leon2000-ai/TSGaussian获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.