A hierarchical occupancy network with multi‐height attention for vision‐centric 3D occupancy prediction

Can Li, Zhi Gao, Zhipeng Lin, Tonghui Ye, Ziyao Li
{"title":"A hierarchical occupancy network with multi‐height attention for vision‐centric 3D occupancy prediction","authors":"Can Li, Zhi Gao, Zhipeng Lin, Tonghui Ye, Ziyao Li","doi":"10.1111/phor.12500","DOIUrl":null,"url":null,"abstract":"The precise geometric representation and ability to handle long‐tail targets have led to the increasing attention towards vision‐centric 3D occupancy prediction, which models the real world as a voxel‐wise model solely through visual inputs. Despite some notable achievements in this field, many prior or concurrent approaches simply adapt existing spatial cross‐attention (SCA) as their 2D–3D transformation module, which may lead to informative coupling or compromise the global receptive field along the height dimension. To overcome these limitations, we propose a hierarchical occupancy (HierOcc) network featuring our innovative height‐aware cross‐attention (HACA) and hierarchical self‐attention (HSA) as its core modules to achieve enhanced precision and completeness in 3D occupancy prediction. The former module enables 2D–3D transformation, while the latter promotes voxels’ intercommunication. The key insight behind both modules is our multi‐height attention mechanism which ensures each attention head corresponds explicitly to a specific height, thereby decoupling height information while maintaining global attention across the height dimension. Extensive experiments show that our method brings significant improvements compared to baseline and surpasses all concurrent methods, demonstrating its superiority.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Photogrammetric Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/phor.12500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The precise geometric representation and ability to handle long‐tail targets have led to the increasing attention towards vision‐centric 3D occupancy prediction, which models the real world as a voxel‐wise model solely through visual inputs. Despite some notable achievements in this field, many prior or concurrent approaches simply adapt existing spatial cross‐attention (SCA) as their 2D–3D transformation module, which may lead to informative coupling or compromise the global receptive field along the height dimension. To overcome these limitations, we propose a hierarchical occupancy (HierOcc) network featuring our innovative height‐aware cross‐attention (HACA) and hierarchical self‐attention (HSA) as its core modules to achieve enhanced precision and completeness in 3D occupancy prediction. The former module enables 2D–3D transformation, while the latter promotes voxels’ intercommunication. The key insight behind both modules is our multi‐height attention mechanism which ensures each attention head corresponds explicitly to a specific height, thereby decoupling height information while maintaining global attention across the height dimension. Extensive experiments show that our method brings significant improvements compared to baseline and surpasses all concurrent methods, demonstrating its superiority.
多高度关注的分层占用网络,用于以视觉为中心的三维占用预测
精确的几何表示和处理长尾目标的能力使人们越来越关注以视觉为中心的三维占位预测,这种预测仅通过视觉输入将真实世界建模为一个体素模型。尽管在这一领域取得了一些显著成就,但许多先前或同时出现的方法只是将现有的空间交叉注意(SCA)作为其 2D-3D 转换模块,这可能会导致信息耦合或损害沿高度维度的全局感受野。为了克服这些局限性,我们提出了分层占位(HierOcc)网络,以创新的高度感知交叉注意(HACA)和分层自注意(HSA)为核心模块,从而提高三维占位预测的精度和完整性。前者实现了 2D-3D 转换,后者促进了体素之间的互通。这两个模块背后的关键见解是我们的多高度注意机制,它确保每个注意头明确对应于特定高度,从而在高度维度上保持全局注意的同时解耦高度信息。广泛的实验表明,与基线相比,我们的方法带来了显著的改进,并超越了所有并行方法,证明了它的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信