Spatial awareness enhancement based single-stage anchor-free 3D object detection for autonomous driving

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2024-09-02 DOI:10.1016/j.displa.2024.102821

Xinyu Sun , Lisheng Jin , Huanhuan Wang , Zhen Huo , Yang He , Guangqi Wang

{"title":"Spatial awareness enhancement based single-stage anchor-free 3D object detection for autonomous driving","authors":"Xinyu Sun , Lisheng Jin , Huanhuan Wang , Zhen Huo , Yang He , Guangqi Wang","doi":"10.1016/j.displa.2024.102821","DOIUrl":null,"url":null,"abstract":"<div><p>The real-time and accurate detection of three-dimensional (3D) objects based on LiDAR is a focal problem in the field of autonomous driving environment perception. Compared to two-stage and anchor-based 3D object detection methods that suffer from inference latency challenges, single-stage anchor-free 3D object detection approaches are more suitable for deployment in autonomous driving vehicles with the strict real-time requirement. However, they face the issue of insufficient spatial awareness, which can result in detection errors such as false positives and false negatives, thereby increasing the potential risks of autonomous driving. In response to this, we focus on enhancing the spatial awareness of CenterPoint, a widely used single-stage anchor-free 3D object detector in the industry. Considering the limited allocation of computational resources and the performance bottleneck caused by pillar encoder, we propose an efficient SSDCM backbone to strengthen feature representation and extraction. Furthermore, a simple BGC neck is devised to weight and exchange contextual information in order to deeply fuse multi-scale features. Combining improved backbone and neck networks, we construct a single-stage anchor-free 3D object detection model with spatial awareness enhancement, named CenterPoint-Spatial Awareness Enhancement (CenterPoint-SAE). We evaluate CenterPoint-SAE on two large-scale and challenging autonomous driving datasets, nuScenes and Waymo. It achieves 53.3% mAP and 62.5% NDS on nuScenes detection benchmark, and runs inference at a speed of 11.1 FPS. Compared to the baseline, the upgraded networks deliver a performance improvement of 1.6% mAP and 1.2% NDS at minor cost. Notably, on Waymo dataset, our method achieves competitive detection performance compared to two-stage and point-based methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102821"},"PeriodicalIF":3.7000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224001859","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The real-time and accurate detection of three-dimensional (3D) objects based on LiDAR is a focal problem in the field of autonomous driving environment perception. Compared to two-stage and anchor-based 3D object detection methods that suffer from inference latency challenges, single-stage anchor-free 3D object detection approaches are more suitable for deployment in autonomous driving vehicles with the strict real-time requirement. However, they face the issue of insufficient spatial awareness, which can result in detection errors such as false positives and false negatives, thereby increasing the potential risks of autonomous driving. In response to this, we focus on enhancing the spatial awareness of CenterPoint, a widely used single-stage anchor-free 3D object detector in the industry. Considering the limited allocation of computational resources and the performance bottleneck caused by pillar encoder, we propose an efficient SSDCM backbone to strengthen feature representation and extraction. Furthermore, a simple BGC neck is devised to weight and exchange contextual information in order to deeply fuse multi-scale features. Combining improved backbone and neck networks, we construct a single-stage anchor-free 3D object detection model with spatial awareness enhancement, named CenterPoint-Spatial Awareness Enhancement (CenterPoint-SAE). We evaluate CenterPoint-SAE on two large-scale and challenging autonomous driving datasets, nuScenes and Waymo. It achieves 53.3% mAP and 62.5% NDS on nuScenes detection benchmark, and runs inference at a speed of 11.1 FPS. Compared to the baseline, the upgraded networks deliver a performance improvement of 1.6% mAP and 1.2% NDS at minor cost. Notably, on Waymo dataset, our method achieves competitive detection performance compared to two-stage and point-based methods.

查看原文本刊更多论文

基于单级无锚三维物体检测的空间感知增强技术，用于自动驾驶

基于激光雷达的三维（3D）物体实时准确检测是自动驾驶环境感知领域的一个焦点问题。与存在推理延迟问题的两阶段和基于锚的三维物体检测方法相比，单阶段无锚三维物体检测方法更适合部署在有严格实时性要求的自动驾驶车辆中。然而，它们面临着空间感知能力不足的问题，可能导致假阳性和假阴性等检测错误，从而增加自动驾驶的潜在风险。为此，我们重点研究了如何增强业界广泛使用的单级无锚三维物体检测器 CenterPoint 的空间感知能力。考虑到计算资源的有限分配和支柱编码器造成的性能瓶颈，我们提出了一种高效的 SSDCM 骨干来加强特征表示和提取。此外，我们还设计了一种简单的 BGC 颈部网络来加权和交换上下文信息，从而深度融合多尺度特征。结合改进后的骨干和颈部网络，我们构建了一种具有空间感知增强功能的单级无锚三维物体检测模型，命名为中心点-空间感知增强（CenterPoint-SAE）。我们在两个具有挑战性的大规模自动驾驶数据集 nuScenes 和 Waymo 上对 CenterPoint-SAE 进行了评估。它在 nuScenes 检测基准上实现了 53.3% 的 mAP 和 62.5% 的 NDS，并以 11.1 FPS 的速度运行推理。与基线相比，升级后的网络性能提高了 1.6% mAP 和 1.2% NDS，但成本较低。值得注意的是，在 Waymo 数据集上，与两阶段方法和基于点的方法相比，我们的方法实现了具有竞争力的检测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.