Cues3D：释放鞋底NeRF的力量，在开放词汇3D全景分割中实现一致和独特的实例

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-04-05 DOI:10.1016/j.inffus.2025.103164

Feng Xue , Wenzhuang Xu , Guofeng Zhong , Anlong Ming , Nicu Sebe

{"title":"Cues3D：释放鞋底NeRF的力量，在开放词汇3D全景分割中实现一致和独特的实例","authors":"Feng Xue , Wenzhuang Xu , Guofeng Zhong , Anlong Ming , Nicu Sebe","doi":"10.1016/j.inffus.2025.103164","DOIUrl":null,"url":null,"abstract":"<div><div>Open-vocabulary 3D panoptic segmentation has recently emerged as a significant trend. Top-performing methods currently integrate 2D segmentation with geometry-aware 3D primitives. However, the advantage would be lost without high-fidelity 3D point clouds, such as methods based on Neural Radiance Field (NeRF). These methods are limited by the insufficient capacity to maintain consistency across partial observations. To address this, recent works have utilized contrastive loss or cross-view association pre-processing for view consensus. In contrast to them, we present <strong>Cues3D</strong>, a compact approach that relies solely on NeRF instead of pre-associations. The core idea is that NeRF’s implicit 3D field inherently establishes a globally consistent geometry, enabling effective object distinction without explicit cross-view supervision. We propose a three-phase training framework for NeRF, <em>initialization</em>-<em>disambiguation</em>-<em>refinement</em>, whereby the instance IDs are corrected using the initially-learned knowledge. Additionally, an instance disambiguation method is proposed to match NeRF-rendered 3D masks and ensure globally unique 3D instance identities. With the aid of Cues3D, we obtain highly consistent and unique 3D instance ID for each object across views with a balanced version of NeRF. Our experiments are conducted on ScanNet v2, ScanNet200, ScanNet++, and Replica datasets for 3D instance, panoptic, and semantic segmentation tasks. Cues3D outperforms other 2D image-based methods and competes with the latest 2D-3D merging based methods, while even surpassing them when using additional 3D point clouds. The code link could be found in the appendix and will be released on <span><span><em>github</em></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103164"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cues3D: Unleashing the power of sole NeRF for consistent and unique instances in open-vocabulary 3D panoptic segmentation\",\"authors\":\"Feng Xue , Wenzhuang Xu , Guofeng Zhong , Anlong Ming , Nicu Sebe\",\"doi\":\"10.1016/j.inffus.2025.103164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Open-vocabulary 3D panoptic segmentation has recently emerged as a significant trend. Top-performing methods currently integrate 2D segmentation with geometry-aware 3D primitives. However, the advantage would be lost without high-fidelity 3D point clouds, such as methods based on Neural Radiance Field (NeRF). These methods are limited by the insufficient capacity to maintain consistency across partial observations. To address this, recent works have utilized contrastive loss or cross-view association pre-processing for view consensus. In contrast to them, we present <strong>Cues3D</strong>, a compact approach that relies solely on NeRF instead of pre-associations. The core idea is that NeRF’s implicit 3D field inherently establishes a globally consistent geometry, enabling effective object distinction without explicit cross-view supervision. We propose a three-phase training framework for NeRF, <em>initialization</em>-<em>disambiguation</em>-<em>refinement</em>, whereby the instance IDs are corrected using the initially-learned knowledge. Additionally, an instance disambiguation method is proposed to match NeRF-rendered 3D masks and ensure globally unique 3D instance identities. With the aid of Cues3D, we obtain highly consistent and unique 3D instance ID for each object across views with a balanced version of NeRF. Our experiments are conducted on ScanNet v2, ScanNet200, ScanNet++, and Replica datasets for 3D instance, panoptic, and semantic segmentation tasks. Cues3D outperforms other 2D image-based methods and competes with the latest 2D-3D merging based methods, while even surpassing them when using additional 3D point clouds. The code link could be found in the appendix and will be released on <span><span><em>github</em></span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"122 \",\"pages\":\"Article 103164\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525002374\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525002374","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

开放词汇三维全视分割是近年来出现的一个重要趋势。目前表现最好的方法是将2D分割与几何感知的3D基元相结合。然而，如果没有高保真的3D点云，比如基于神经辐射场（NeRF）的方法，这种优势就会失去。这些方法受到维持部分观测结果一致性的能力不足的限制。为了解决这个问题，最近的研究利用对比损失或跨视图关联预处理来实现视图一致性。与它们相反，我们提出了Cues3D，这是一种仅依赖于NeRF而不是预关联的紧凑方法。核心思想是，NeRF的隐式3D场固有地建立了全局一致的几何形状，无需显式的交叉视图监督即可实现有效的对象区分。我们为NeRF提出了一个三阶段的训练框架，即初始化-消歧义-精化，其中使用初始学习的知识来纠正实例id。此外，提出了一种实例消歧方法来匹配nerf渲染的3D掩码，并确保全局唯一的3D实例身份。借助Cues3D，我们获得高度一致和唯一的3D实例ID为每个对象跨视图与NeRF的平衡版本。我们的实验是在ScanNet v2， ScanNet200， scannet++和Replica数据集上进行的，用于3D实例，全景和语义分割任务。Cues3D优于其他基于2D图像的方法，并与最新的基于2D-3D合并的方法竞争，而在使用额外的3D点云时甚至超过了它们。代码链接可以在附录中找到，并将在github上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cues3D: Unleashing the power of sole NeRF for consistent and unique instances in open-vocabulary 3D panoptic segmentation

Open-vocabulary 3D panoptic segmentation has recently emerged as a significant trend. Top-performing methods currently integrate 2D segmentation with geometry-aware 3D primitives. However, the advantage would be lost without high-fidelity 3D point clouds, such as methods based on Neural Radiance Field (NeRF). These methods are limited by the insufficient capacity to maintain consistency across partial observations. To address this, recent works have utilized contrastive loss or cross-view association pre-processing for view consensus. In contrast to them, we present Cues3D, a compact approach that relies solely on NeRF instead of pre-associations. The core idea is that NeRF’s implicit 3D field inherently establishes a globally consistent geometry, enabling effective object distinction without explicit cross-view supervision. We propose a three-phase training framework for NeRF, initialization-disambiguation-refinement, whereby the instance IDs are corrected using the initially-learned knowledge. Additionally, an instance disambiguation method is proposed to match NeRF-rendered 3D masks and ensure globally unique 3D instance identities. With the aid of Cues3D, we obtain highly consistent and unique 3D instance ID for each object across views with a balanced version of NeRF. Our experiments are conducted on ScanNet v2, ScanNet200, ScanNet++, and Replica datasets for 3D instance, panoptic, and semantic segmentation tasks. Cues3D outperforms other 2D image-based methods and competes with the latest 2D-3D merging based methods, while even surpassing them when using additional 3D point clouds. The code link could be found in the appendix and will be released on github.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.