基于语义感知引导和交互细化转换器的语义场景补全

IF 8.3 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Haihong Xiao;Wenxiong Kang;Hao Liu;Yuqiong Li;Ying He
{"title":"基于语义感知引导和交互细化转换器的语义场景补全","authors":"Haihong Xiao;Wenxiong Kang;Hao Liu;Yuqiong Li;Ying He","doi":"10.1109/TCSVT.2024.3518493","DOIUrl":null,"url":null,"abstract":"Predicting per-voxel occupancy status and corresponding semantic labels in 3D scenes is pivotal to 3D intelligent perception in autonomous driving. In this paper, we propose a novel semantic scene completion framework that can generate complete 3D volumetric semantics from a single image at a low cost. To the best of our knowledge, this is the first endeavor specifically aimed at mitigating the negative impacts of incorrect voxel query proposals caused by erroneous depth estimates and enhancing interactions for positive ones in camera-based semantic scene completion tasks. Specifically, we present a straightforward yet effective Semantic-aware Guided (SAG) module, which seamlessly integrates with task-related semantic priors to facilitate effective interactions between image features and voxel query proposals in a plug-and-play manner. Furthermore, we introduce a set of learnable object queries to better perceive objects within the scene. Building on this, we propose an Interactive Refinement Transformer (IRT) block, which iteratively updates voxel query proposals to enhance the perception of semantics and objects within the scene by leveraging the interaction between object queries and voxel queries through query-to-query cross-attention. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches, achieving overall improvements of 0.30 and 2.74 in mIoU metric on the SemanticKITTI and SSCBench-KITTI-360 validation datasets, respectively, while also showing superior performance in the aspect of small object generation.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4212-4225"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Scene Completion via Semantic-Aware Guidance and Interactive Refinement Transformer\",\"authors\":\"Haihong Xiao;Wenxiong Kang;Hao Liu;Yuqiong Li;Ying He\",\"doi\":\"10.1109/TCSVT.2024.3518493\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predicting per-voxel occupancy status and corresponding semantic labels in 3D scenes is pivotal to 3D intelligent perception in autonomous driving. In this paper, we propose a novel semantic scene completion framework that can generate complete 3D volumetric semantics from a single image at a low cost. To the best of our knowledge, this is the first endeavor specifically aimed at mitigating the negative impacts of incorrect voxel query proposals caused by erroneous depth estimates and enhancing interactions for positive ones in camera-based semantic scene completion tasks. Specifically, we present a straightforward yet effective Semantic-aware Guided (SAG) module, which seamlessly integrates with task-related semantic priors to facilitate effective interactions between image features and voxel query proposals in a plug-and-play manner. Furthermore, we introduce a set of learnable object queries to better perceive objects within the scene. Building on this, we propose an Interactive Refinement Transformer (IRT) block, which iteratively updates voxel query proposals to enhance the perception of semantics and objects within the scene by leveraging the interaction between object queries and voxel queries through query-to-query cross-attention. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches, achieving overall improvements of 0.30 and 2.74 in mIoU metric on the SemanticKITTI and SSCBench-KITTI-360 validation datasets, respectively, while also showing superior performance in the aspect of small object generation.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4212-4225\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10804191/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10804191/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

预测3D场景中每个体素的占用状态和相应的语义标签是自动驾驶中3D智能感知的关键。在本文中,我们提出了一种新的语义场景补全框架,可以从单个图像以低成本生成完整的3D体积语义。据我们所知,这是第一个专门针对减轻由错误深度估计引起的不正确体素查询建议的负面影响和增强基于相机的语义场景完成任务中积极的交互的努力。具体来说,我们提出了一个简单而有效的语义感知引导(SAG)模块,它与任务相关的语义先验无缝集成,以即插即用的方式促进图像特征和体素查询建议之间的有效交互。此外,我们引入了一组可学习的对象查询,以更好地感知场景中的对象。在此基础上,我们提出了一个交互式改进转换器(IRT)块,该块通过查询到查询的交叉关注利用对象查询和体素查询之间的交互来迭代更新体素查询建议,以增强对场景中语义和对象的感知。大量的实验表明,我们的方法优于现有的最先进的方法,在SemanticKITTI和sschbench - kitti -360验证数据集上分别实现了0.30和2.74的mIoU度量的总体改进,同时在小对象生成方面也表现出优异的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semantic Scene Completion via Semantic-Aware Guidance and Interactive Refinement Transformer
Predicting per-voxel occupancy status and corresponding semantic labels in 3D scenes is pivotal to 3D intelligent perception in autonomous driving. In this paper, we propose a novel semantic scene completion framework that can generate complete 3D volumetric semantics from a single image at a low cost. To the best of our knowledge, this is the first endeavor specifically aimed at mitigating the negative impacts of incorrect voxel query proposals caused by erroneous depth estimates and enhancing interactions for positive ones in camera-based semantic scene completion tasks. Specifically, we present a straightforward yet effective Semantic-aware Guided (SAG) module, which seamlessly integrates with task-related semantic priors to facilitate effective interactions between image features and voxel query proposals in a plug-and-play manner. Furthermore, we introduce a set of learnable object queries to better perceive objects within the scene. Building on this, we propose an Interactive Refinement Transformer (IRT) block, which iteratively updates voxel query proposals to enhance the perception of semantics and objects within the scene by leveraging the interaction between object queries and voxel queries through query-to-query cross-attention. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches, achieving overall improvements of 0.30 and 2.74 in mIoU metric on the SemanticKITTI and SSCBench-KITTI-360 validation datasets, respectively, while also showing superior performance in the aspect of small object generation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信