IAE-BEV：基于bev的多视图三维目标检测的实例自适应增强

IF 5.3 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-09-02 DOI:10.1109/LRA.2025.3604760

Bin Wang;Qiang Zhao;Chongben Tao;Yaoqi Sun;Chenggang Yan

{"title":"IAE-BEV：基于bev的多视图三维目标检测的实例自适应增强","authors":"Bin Wang;Qiang Zhao;Chongben Tao;Yaoqi Sun;Chenggang Yan","doi":"10.1109/LRA.2025.3604760","DOIUrl":null,"url":null,"abstract":"Camera-based Bird's-Eye-View (BEV) representation has become a viable solution for 3D object detection in cost-effective autonomous driving. Currently, the explicit paradigm based on the lift-splat-shoot (LSS) pipeline has become one of the mainstream methods due to its efficiency and ease of deployment. However, the process of flattening the spatial representation in this pipeline mixes target features with excessive background noise. In addition, the generated BEV features are sparse due to the inherent characteristics of camera imaging. To address these limitations, we propose IAE-BEV, a novel two-stage multi-view 3D object detector that adaptively integrates instance features into BEV features, ultimately constructing a BEV representation that highlights instance information and alleviates sparsity. We also introduce the Occupancy Mask Pool, which enables instance features to interact with the 2D image plane in a more targeted and efficient manner. To further distinguish instances along the same camera ray, we design the Angle-Adaptive Self-Attention, which learns appropriate feature weights under the guidance of queries. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and generalizability of our proposed framework, achieving up to +2.96% mAP improvement over state-of-the-art LSS-based baseline.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 10","pages":"10610-10617"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IAE-BEV:Instance-Adaptive Enhancement for BEV-Based Multi-View 3D Object Detection\",\"authors\":\"Bin Wang;Qiang Zhao;Chongben Tao;Yaoqi Sun;Chenggang Yan\",\"doi\":\"10.1109/LRA.2025.3604760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Camera-based Bird's-Eye-View (BEV) representation has become a viable solution for 3D object detection in cost-effective autonomous driving. Currently, the explicit paradigm based on the lift-splat-shoot (LSS) pipeline has become one of the mainstream methods due to its efficiency and ease of deployment. However, the process of flattening the spatial representation in this pipeline mixes target features with excessive background noise. In addition, the generated BEV features are sparse due to the inherent characteristics of camera imaging. To address these limitations, we propose IAE-BEV, a novel two-stage multi-view 3D object detector that adaptively integrates instance features into BEV features, ultimately constructing a BEV representation that highlights instance information and alleviates sparsity. We also introduce the Occupancy Mask Pool, which enables instance features to interact with the 2D image plane in a more targeted and efficient manner. To further distinguish instances along the same camera ray, we design the Angle-Adaptive Self-Attention, which learns appropriate feature weights under the guidance of queries. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and generalizability of our proposed framework, achieving up to +2.96% mAP improvement over state-of-the-art LSS-based baseline.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 10\",\"pages\":\"10610-10617\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11146628/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11146628/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

基于摄像头的鸟瞰图（BEV）表示已成为成本效益高的自动驾驶中3D目标检测的可行解决方案。目前，基于提升-飞溅-射击（LSS）管道的显式范式因其效率高、易于部署而成为主流方法之一。然而，该管道中空间表示的平坦化过程将目标特征与过多的背景噪声混合在一起。此外，由于相机成像的固有特性，生成的BEV特征是稀疏的。为了解决这些限制，我们提出了IAE-BEV，一种新型的两阶段多视图3D物体检测器，它自适应地将实例特征集成到BEV特征中，最终构建一个突出实例信息并减轻稀疏性的BEV表示。我们还介绍了占用掩码池，它使实例特征能够以更有针对性和更有效的方式与2D图像平面进行交互。为了进一步区分同一摄像机光线上的实例，我们设计了角度自适应自关注，该算法在查询的指导下学习适当的特征权重。在nuScenes数据集上的大量实验证明了我们提出的框架的有效性和泛化性，与最先进的基于lss的基线相比，实现了高达+2.96%的mAP改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

IAE-BEV:Instance-Adaptive Enhancement for BEV-Based Multi-View 3D Object Detection

Camera-based Bird's-Eye-View (BEV) representation has become a viable solution for 3D object detection in cost-effective autonomous driving. Currently, the explicit paradigm based on the lift-splat-shoot (LSS) pipeline has become one of the mainstream methods due to its efficiency and ease of deployment. However, the process of flattening the spatial representation in this pipeline mixes target features with excessive background noise. In addition, the generated BEV features are sparse due to the inherent characteristics of camera imaging. To address these limitations, we propose IAE-BEV, a novel two-stage multi-view 3D object detector that adaptively integrates instance features into BEV features, ultimately constructing a BEV representation that highlights instance information and alleviates sparsity. We also introduce the Occupancy Mask Pool, which enables instance features to interact with the 2D image plane in a more targeted and efficient manner. To further distinguish instances along the same camera ray, we design the Angle-Adaptive Self-Attention, which learns appropriate feature weights under the guidance of queries. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and generalizability of our proposed framework, achieving up to +2.96% mAP improvement over state-of-the-art LSS-based baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.