深度高度解耦实现基于视觉的精确三维占位预测

Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang
{"title":"深度高度解耦实现基于视觉的精确三维占位预测","authors":"Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang","doi":"arxiv-2409.07972","DOIUrl":null,"url":null,"abstract":"The task of vision-based 3D occupancy prediction aims to reconstruct 3D\ngeometry and estimate its semantic classes from 2D color images, where the\n2D-to-3D view transformation is an indispensable step. Most previous methods\nconduct forward projection, such as BEVPooling and VoxelPooling, both of which\nmap the 2D image features into 3D grids. However, the current grid representing\nfeatures within a certain height range usually introduces many confusing\nfeatures that belong to other height ranges. To address this challenge, we\npresent Deep Height Decoupling (DHD), a novel framework that incorporates\nexplicit height prior to filter out the confusing features. Specifically, DHD\nfirst predicts height maps via explicit supervision. Based on the height\ndistribution statistics, DHD designs Mask Guided Height Sampling (MGHS) to\nadaptively decoupled the height map into multiple binary masks. MGHS projects\nthe 2D image features into multiple subspaces, where each grid contains\nfeatures within reasonable height ranges. Finally, a Synergistic Feature\nAggregation (SFA) module is deployed to enhance the feature representation\nthrough channel and spatial affinities, enabling further occupancy refinement.\nOn the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art\nperformance even with minimal input frames. Code is available at\nhttps://github.com/yanzq95/DHD.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction\",\"authors\":\"Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang\",\"doi\":\"arxiv-2409.07972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of vision-based 3D occupancy prediction aims to reconstruct 3D\\ngeometry and estimate its semantic classes from 2D color images, where the\\n2D-to-3D view transformation is an indispensable step. Most previous methods\\nconduct forward projection, such as BEVPooling and VoxelPooling, both of which\\nmap the 2D image features into 3D grids. However, the current grid representing\\nfeatures within a certain height range usually introduces many confusing\\nfeatures that belong to other height ranges. To address this challenge, we\\npresent Deep Height Decoupling (DHD), a novel framework that incorporates\\nexplicit height prior to filter out the confusing features. Specifically, DHD\\nfirst predicts height maps via explicit supervision. Based on the height\\ndistribution statistics, DHD designs Mask Guided Height Sampling (MGHS) to\\nadaptively decoupled the height map into multiple binary masks. MGHS projects\\nthe 2D image features into multiple subspaces, where each grid contains\\nfeatures within reasonable height ranges. Finally, a Synergistic Feature\\nAggregation (SFA) module is deployed to enhance the feature representation\\nthrough channel and spatial affinities, enabling further occupancy refinement.\\nOn the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art\\nperformance even with minimal input frames. Code is available at\\nhttps://github.com/yanzq95/DHD.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07972\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于视觉的三维占位预测任务旨在从二维彩色图像中重建三维几何图形并估计其语义类别,其中二维到三维的视图转换是不可或缺的一步。之前的大多数方法都是进行前向投影,如 BEVPooling 和 VoxelPooling,这两种方法都是将二维图像特征映射到三维网格中。然而,目前表示某一高度范围内特征的网格通常会引入许多属于其他高度范围的混淆特征。为了应对这一挑战,我们提出了深度高度解耦 (DHD),这是一个新颖的框架,它结合了明确的高度先验来过滤掉混淆的特征。具体来说,DHD 首先通过显式监督来预测高度图。基于高度分布统计,DHD 设计了掩码引导高度采样(MGHS),以适应性地将高度图解耦为多个二进制掩码。MGHS 将二维图像特征投射到多个子空间中,每个网格包含合理高度范围内的特征。最后,我们部署了一个协同特征聚合(SFA)模块,通过通道和空间亲和力来增强特征表示,从而实现进一步的占位细化。在流行的 Occ3D-nuScenes 基准上,即使输入帧数极少,我们的方法也能达到最先进的性能。代码可在https://github.com/yanzq95/DHD。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction
The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of which map the 2D image features into 3D grids. However, the current grid representing features within a certain height range usually introduces many confusing features that belong to other height ranges. To address this challenge, we present Deep Height Decoupling (DHD), a novel framework that incorporates explicit height prior to filter out the confusing features. Specifically, DHD first predicts height maps via explicit supervision. Based on the height distribution statistics, DHD designs Mask Guided Height Sampling (MGHS) to adaptively decoupled the height map into multiple binary masks. MGHS projects the 2D image features into multiple subspaces, where each grid contains features within reasonable height ranges. Finally, a Synergistic Feature Aggregation (SFA) module is deployed to enhance the feature representation through channel and spatial affinities, enabling further occupancy refinement. On the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art performance even with minimal input frames. Code is available at https://github.com/yanzq95/DHD.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信