描边引导场景图生成。

IF 6.5

IEEE transactions on visualization and computer graphics Pub Date : 2025-10-02 DOI:10.1109/TVCG.2025.3616751

Qixiang Ma, Runze Fan, Lizhi Zhao, Jian Wu, Sio-Kei Im, Lili Wang

{"title":"描边引导场景图生成。","authors":"Qixiang Ma, Runze Fan, Lizhi Zhao, Jian Wu, Sio-Kei Im, Lili Wang","doi":"10.1109/TVCG.2025.3616751","DOIUrl":null,"url":null,"abstract":"3D scene graph generation is essential for spatial computing in Extended Reality (XR), providing structured semantics for task planning and intelligent perception. However, unlike instance-segmentation-driven setups, generating semantic scene graphs still suffer from limited accuracy due to coarse and noisy point cloud data typically acquired in practice, and from the lack of interactive strategies to incorporate users, spatialized and intuitive guidance. We identify three key challenges: designing controllable interaction forms, involving guidance in inference, and generalizing from local corrections. To address these, we propose SGSG, a Stroke-Guided Scene Graph generation method that enables users to interactively refine 3D semantic relationships and improve predictions in real time. We propose three types of strokes and a lightweight SGstrokes dataset tailored for this modality. Our model integrates stroke guidance representation and injection for spatio-temporal feature learning and reasoning correction, along with intervention losses that combine consistency-repulsive and geometry-sensitive constraints to enhance accuracy and generalization. Experiments and the user study show that SGSG outperforms state-of-the-art methods 3DSSG and SGFN in overall accuracy and precision, surpasses JointSSG in predicate-level metrics, and reduces task load across all control conditions, establishing SGSG as a new benchmark for interactive 3D scene graph generation and semantic understanding in XR. Implementation resources are available at: https://github.com/Sycamore-Ma/SGSG-runtime.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SGSG: Stroke-Guided Scene Graph Generation.\",\"authors\":\"Qixiang Ma, Runze Fan, Lizhi Zhao, Jian Wu, Sio-Kei Im, Lili Wang\",\"doi\":\"10.1109/TVCG.2025.3616751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"3D scene graph generation is essential for spatial computing in Extended Reality (XR), providing structured semantics for task planning and intelligent perception. However, unlike instance-segmentation-driven setups, generating semantic scene graphs still suffer from limited accuracy due to coarse and noisy point cloud data typically acquired in practice, and from the lack of interactive strategies to incorporate users, spatialized and intuitive guidance. We identify three key challenges: designing controllable interaction forms, involving guidance in inference, and generalizing from local corrections. To address these, we propose SGSG, a Stroke-Guided Scene Graph generation method that enables users to interactively refine 3D semantic relationships and improve predictions in real time. We propose three types of strokes and a lightweight SGstrokes dataset tailored for this modality. Our model integrates stroke guidance representation and injection for spatio-temporal feature learning and reasoning correction, along with intervention losses that combine consistency-repulsive and geometry-sensitive constraints to enhance accuracy and generalization. Experiments and the user study show that SGSG outperforms state-of-the-art methods 3DSSG and SGFN in overall accuracy and precision, surpasses JointSSG in predicate-level metrics, and reduces task load across all control conditions, establishing SGSG as a new benchmark for interactive 3D scene graph generation and semantic understanding in XR. Implementation resources are available at: https://github.com/Sycamore-Ma/SGSG-runtime.\",\"PeriodicalId\":94035,\"journal\":{\"name\":\"IEEE transactions on visualization and computer graphics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on visualization and computer graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TVCG.2025.3616751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3616751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

三维场景图形生成是扩展现实（XR）空间计算的基础，为任务规划和智能感知提供结构化语义。然而，与实例分割驱动的设置不同，由于在实践中通常获得的粗糙和嘈杂的点云数据，以及缺乏结合用户、空间化和直观指导的交互策略，生成语义场景图的准确性仍然有限。我们确定了三个关键挑战：设计可控的交互形式，涉及推理的指导，以及从局部修正中进行推广。为了解决这些问题，我们提出了SGSG，一种笔划引导场景图生成方法，使用户能够交互式地精炼3D语义关系并实时改进预测。我们提出了三种笔画类型和针对这种模式量身定制的轻量级sg笔画数据集。我们的模型集成了笔划引导表示和注入，用于时空特征学习和推理纠正，以及结合一致性排斥和几何敏感约束的干预损失，以提高准确性和泛化。实验和用户研究表明，SGSG在总体精度和精度上优于最先进的方法3DSSG和SGFN，在谓词级指标上优于JointSSG，并且在所有控制条件下减少了任务负载，使SGSG成为XR中交互式3D场景图形生成和语义理解的新基准。实现资源可从https://github.com/Sycamore-Ma/SGSG-runtime获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SGSG: Stroke-Guided Scene Graph Generation.

3D scene graph generation is essential for spatial computing in Extended Reality (XR), providing structured semantics for task planning and intelligent perception. However, unlike instance-segmentation-driven setups, generating semantic scene graphs still suffer from limited accuracy due to coarse and noisy point cloud data typically acquired in practice, and from the lack of interactive strategies to incorporate users, spatialized and intuitive guidance. We identify three key challenges: designing controllable interaction forms, involving guidance in inference, and generalizing from local corrections. To address these, we propose SGSG, a Stroke-Guided Scene Graph generation method that enables users to interactively refine 3D semantic relationships and improve predictions in real time. We propose three types of strokes and a lightweight SGstrokes dataset tailored for this modality. Our model integrates stroke guidance representation and injection for spatio-temporal feature learning and reasoning correction, along with intervention losses that combine consistency-repulsive and geometry-sensitive constraints to enhance accuracy and generalization. Experiments and the user study show that SGSG outperforms state-of-the-art methods 3DSSG and SGFN in overall accuracy and precision, surpasses JointSSG in predicate-level metrics, and reduces task load across all control conditions, establishing SGSG as a new benchmark for interactive 3D scene graph generation and semantic understanding in XR. Implementation resources are available at: https://github.com/Sycamore-Ma/SGSG-runtime.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量