Bin Wang;Qiang Zhao;Chongben Tao;Yaoqi Sun;Chenggang Yan
{"title":"IAE-BEV:基于bev的多视图三维目标检测的实例自适应增强","authors":"Bin Wang;Qiang Zhao;Chongben Tao;Yaoqi Sun;Chenggang Yan","doi":"10.1109/LRA.2025.3604760","DOIUrl":null,"url":null,"abstract":"Camera-based Bird's-Eye-View (BEV) representation has become a viable solution for 3D object detection in cost-effective autonomous driving. Currently, the explicit paradigm based on the lift-splat-shoot (LSS) pipeline has become one of the mainstream methods due to its efficiency and ease of deployment. However, the process of flattening the spatial representation in this pipeline mixes target features with excessive background noise. In addition, the generated BEV features are sparse due to the inherent characteristics of camera imaging. To address these limitations, we propose IAE-BEV, a novel two-stage multi-view 3D object detector that adaptively integrates instance features into BEV features, ultimately constructing a BEV representation that highlights instance information and alleviates sparsity. We also introduce the Occupancy Mask Pool, which enables instance features to interact with the 2D image plane in a more targeted and efficient manner. To further distinguish instances along the same camera ray, we design the Angle-Adaptive Self-Attention, which learns appropriate feature weights under the guidance of queries. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and generalizability of our proposed framework, achieving up to +2.96% mAP improvement over state-of-the-art LSS-based baseline.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 10","pages":"10610-10617"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IAE-BEV:Instance-Adaptive Enhancement for BEV-Based Multi-View 3D Object Detection\",\"authors\":\"Bin Wang;Qiang Zhao;Chongben Tao;Yaoqi Sun;Chenggang Yan\",\"doi\":\"10.1109/LRA.2025.3604760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Camera-based Bird's-Eye-View (BEV) representation has become a viable solution for 3D object detection in cost-effective autonomous driving. Currently, the explicit paradigm based on the lift-splat-shoot (LSS) pipeline has become one of the mainstream methods due to its efficiency and ease of deployment. However, the process of flattening the spatial representation in this pipeline mixes target features with excessive background noise. In addition, the generated BEV features are sparse due to the inherent characteristics of camera imaging. To address these limitations, we propose IAE-BEV, a novel two-stage multi-view 3D object detector that adaptively integrates instance features into BEV features, ultimately constructing a BEV representation that highlights instance information and alleviates sparsity. We also introduce the Occupancy Mask Pool, which enables instance features to interact with the 2D image plane in a more targeted and efficient manner. To further distinguish instances along the same camera ray, we design the Angle-Adaptive Self-Attention, which learns appropriate feature weights under the guidance of queries. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and generalizability of our proposed framework, achieving up to +2.96% mAP improvement over state-of-the-art LSS-based baseline.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 10\",\"pages\":\"10610-10617\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11146628/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11146628/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
IAE-BEV:Instance-Adaptive Enhancement for BEV-Based Multi-View 3D Object Detection
Camera-based Bird's-Eye-View (BEV) representation has become a viable solution for 3D object detection in cost-effective autonomous driving. Currently, the explicit paradigm based on the lift-splat-shoot (LSS) pipeline has become one of the mainstream methods due to its efficiency and ease of deployment. However, the process of flattening the spatial representation in this pipeline mixes target features with excessive background noise. In addition, the generated BEV features are sparse due to the inherent characteristics of camera imaging. To address these limitations, we propose IAE-BEV, a novel two-stage multi-view 3D object detector that adaptively integrates instance features into BEV features, ultimately constructing a BEV representation that highlights instance information and alleviates sparsity. We also introduce the Occupancy Mask Pool, which enables instance features to interact with the 2D image plane in a more targeted and efficient manner. To further distinguish instances along the same camera ray, we design the Angle-Adaptive Self-Attention, which learns appropriate feature weights under the guidance of queries. Extensive experiments on the nuScenes dataset demonstrate the effectiveness and generalizability of our proposed framework, achieving up to +2.96% mAP improvement over state-of-the-art LSS-based baseline.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.