NeXtFusion: Attention-Based Camera-Radar Fusion Network for Improved Three-Dimensional Object Detection and Tracking

Future Internet Pub Date : 2024-03-28 DOI:10.3390/fi16040114

Priyank Kalgaonkar, Mohamed El-Sharkawy

{"title":"NeXtFusion: Attention-Based Camera-Radar Fusion Network for Improved Three-Dimensional Object Detection and Tracking","authors":"Priyank Kalgaonkar, Mohamed El-Sharkawy","doi":"10.3390/fi16040114","DOIUrl":null,"url":null,"abstract":"Accurate perception is crucial for autonomous vehicles (AVs) to navigate safely, especially in adverse weather and lighting conditions where single-sensor networks (e.g., cameras or radar) struggle with reduced maneuverability and unrecognizable targets. Deep Camera-Radar fusion neural networks offer a promising solution for reliable AV perception under any weather and lighting conditions. Cameras provide rich semantic information, while radars act like an X-ray vision, piercing through fog and darkness. This work proposes a novel, efficient Camera-Radar fusion network called NeXtFusion for robust AV perception with an improvement in object detection accuracy and tracking. Our proposed approach of utilizing an attention module enhances crucial feature representation for object detection while minimizing information loss from multi-modal data. Extensive experiments on the challenging nuScenes dataset demonstrate NeXtFusion’s superior performance in detecting small and distant objects compared to other methods. Notably, NeXtFusion achieves the highest mAP score (0.473) on the nuScenes validation set, outperforming competitors like OFT (35.1% improvement) and MonoDIS (9.5% improvement). Additionally, NeXtFusion demonstrates strong performance in other metrics like mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Furthermore, visualizations of nuScenes data processed by NeXtFusion further demonstrate its capability to handle diverse real-world scenarios. These results suggest that NeXtFusion is a promising deep fusion network for improving AV perception and safety for autonomous driving.","PeriodicalId":509567,"journal":{"name":"Future Internet","volume":"40 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Internet","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/fi16040114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate perception is crucial for autonomous vehicles (AVs) to navigate safely, especially in adverse weather and lighting conditions where single-sensor networks (e.g., cameras or radar) struggle with reduced maneuverability and unrecognizable targets. Deep Camera-Radar fusion neural networks offer a promising solution for reliable AV perception under any weather and lighting conditions. Cameras provide rich semantic information, while radars act like an X-ray vision, piercing through fog and darkness. This work proposes a novel, efficient Camera-Radar fusion network called NeXtFusion for robust AV perception with an improvement in object detection accuracy and tracking. Our proposed approach of utilizing an attention module enhances crucial feature representation for object detection while minimizing information loss from multi-modal data. Extensive experiments on the challenging nuScenes dataset demonstrate NeXtFusion’s superior performance in detecting small and distant objects compared to other methods. Notably, NeXtFusion achieves the highest mAP score (0.473) on the nuScenes validation set, outperforming competitors like OFT (35.1% improvement) and MonoDIS (9.5% improvement). Additionally, NeXtFusion demonstrates strong performance in other metrics like mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Furthermore, visualizations of nuScenes data processed by NeXtFusion further demonstrate its capability to handle diverse real-world scenarios. These results suggest that NeXtFusion is a promising deep fusion network for improving AV perception and safety for autonomous driving.

查看原文本刊更多论文

NeXtFusion：基于注意力的摄像头-雷达融合网络，用于改进三维物体检测和跟踪

准确的感知对于自动驾驶汽车（AV）的安全导航至关重要，尤其是在恶劣天气和光照条件下，单传感器网络（如摄像头或雷达）会因机动性降低和目标无法识别而陷入困境。深度摄像头-雷达融合神经网络为在任何天气和照明条件下实现可靠的自动驾驶汽车感知提供了一种前景广阔的解决方案。摄像头能提供丰富的语义信息，而雷达则能像 X 射线一样穿透浓雾和黑暗。这项研究提出了一种名为 NeXtFusion 的新颖、高效的摄像头-雷达融合网络，用于实现稳健的视听感知，提高物体检测和跟踪的准确性。我们提出的利用注意力模块的方法增强了物体检测的关键特征表示，同时最大限度地减少了多模态数据的信息损失。在极具挑战性的 nuScenes 数据集上进行的大量实验证明，与其他方法相比，NeXtFusion 在检测小物体和远距离物体方面表现出色。值得注意的是，NeXtFusion 在 nuScenes 验证集上获得了最高的 mAP 分数（0.473），超过了 OFT（提高了 35.1%）和 MonoDIS（提高了 9.5%）等竞争对手。此外，NeXtFusion 在 mATE（0.449）和 mAOE（0.534）等其他指标上也表现出色，凸显了其在 3D 物体检测方面的整体效能。此外，由 NeXtFusion 处理的 nuScenes 数据的可视化效果进一步证明了它处理各种真实世界场景的能力。这些结果表明，NeXtFusion 是一种很有前途的深度融合网络，可用于提高自动驾驶的视听感知和安全性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Internet

自引率

0.00%

发文量