NeXtFusion:基于注意力的摄像头-雷达融合网络,用于改进三维物体检测和跟踪

Future Internet Pub Date : 2024-03-28 DOI:10.3390/fi16040114
Priyank Kalgaonkar, Mohamed El-Sharkawy
{"title":"NeXtFusion:基于注意力的摄像头-雷达融合网络,用于改进三维物体检测和跟踪","authors":"Priyank Kalgaonkar, Mohamed El-Sharkawy","doi":"10.3390/fi16040114","DOIUrl":null,"url":null,"abstract":"Accurate perception is crucial for autonomous vehicles (AVs) to navigate safely, especially in adverse weather and lighting conditions where single-sensor networks (e.g., cameras or radar) struggle with reduced maneuverability and unrecognizable targets. Deep Camera-Radar fusion neural networks offer a promising solution for reliable AV perception under any weather and lighting conditions. Cameras provide rich semantic information, while radars act like an X-ray vision, piercing through fog and darkness. This work proposes a novel, efficient Camera-Radar fusion network called NeXtFusion for robust AV perception with an improvement in object detection accuracy and tracking. Our proposed approach of utilizing an attention module enhances crucial feature representation for object detection while minimizing information loss from multi-modal data. Extensive experiments on the challenging nuScenes dataset demonstrate NeXtFusion’s superior performance in detecting small and distant objects compared to other methods. Notably, NeXtFusion achieves the highest mAP score (0.473) on the nuScenes validation set, outperforming competitors like OFT (35.1% improvement) and MonoDIS (9.5% improvement). Additionally, NeXtFusion demonstrates strong performance in other metrics like mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Furthermore, visualizations of nuScenes data processed by NeXtFusion further demonstrate its capability to handle diverse real-world scenarios. These results suggest that NeXtFusion is a promising deep fusion network for improving AV perception and safety for autonomous driving.","PeriodicalId":509567,"journal":{"name":"Future Internet","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NeXtFusion: Attention-Based Camera-Radar Fusion Network for Improved Three-Dimensional Object Detection and Tracking\",\"authors\":\"Priyank Kalgaonkar, Mohamed El-Sharkawy\",\"doi\":\"10.3390/fi16040114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate perception is crucial for autonomous vehicles (AVs) to navigate safely, especially in adverse weather and lighting conditions where single-sensor networks (e.g., cameras or radar) struggle with reduced maneuverability and unrecognizable targets. Deep Camera-Radar fusion neural networks offer a promising solution for reliable AV perception under any weather and lighting conditions. Cameras provide rich semantic information, while radars act like an X-ray vision, piercing through fog and darkness. This work proposes a novel, efficient Camera-Radar fusion network called NeXtFusion for robust AV perception with an improvement in object detection accuracy and tracking. Our proposed approach of utilizing an attention module enhances crucial feature representation for object detection while minimizing information loss from multi-modal data. Extensive experiments on the challenging nuScenes dataset demonstrate NeXtFusion’s superior performance in detecting small and distant objects compared to other methods. Notably, NeXtFusion achieves the highest mAP score (0.473) on the nuScenes validation set, outperforming competitors like OFT (35.1% improvement) and MonoDIS (9.5% improvement). Additionally, NeXtFusion demonstrates strong performance in other metrics like mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Furthermore, visualizations of nuScenes data processed by NeXtFusion further demonstrate its capability to handle diverse real-world scenarios. These results suggest that NeXtFusion is a promising deep fusion network for improving AV perception and safety for autonomous driving.\",\"PeriodicalId\":509567,\"journal\":{\"name\":\"Future Internet\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Internet\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/fi16040114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Internet","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/fi16040114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

准确的感知对于自动驾驶汽车(AV)的安全导航至关重要,尤其是在恶劣天气和光照条件下,单传感器网络(如摄像头或雷达)会因机动性降低和目标无法识别而陷入困境。深度摄像头-雷达融合神经网络为在任何天气和照明条件下实现可靠的自动驾驶汽车感知提供了一种前景广阔的解决方案。摄像头能提供丰富的语义信息,而雷达则能像 X 射线一样穿透浓雾和黑暗。这项研究提出了一种名为 NeXtFusion 的新颖、高效的摄像头-雷达融合网络,用于实现稳健的视听感知,提高物体检测和跟踪的准确性。我们提出的利用注意力模块的方法增强了物体检测的关键特征表示,同时最大限度地减少了多模态数据的信息损失。在极具挑战性的 nuScenes 数据集上进行的大量实验证明,与其他方法相比,NeXtFusion 在检测小物体和远距离物体方面表现出色。值得注意的是,NeXtFusion 在 nuScenes 验证集上获得了最高的 mAP 分数(0.473),超过了 OFT(提高了 35.1%)和 MonoDIS(提高了 9.5%)等竞争对手。此外,NeXtFusion 在 mATE(0.449)和 mAOE(0.534)等其他指标上也表现出色,凸显了其在 3D 物体检测方面的整体效能。此外,由 NeXtFusion 处理的 nuScenes 数据的可视化效果进一步证明了它处理各种真实世界场景的能力。这些结果表明,NeXtFusion 是一种很有前途的深度融合网络,可用于提高自动驾驶的视听感知和安全性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NeXtFusion: Attention-Based Camera-Radar Fusion Network for Improved Three-Dimensional Object Detection and Tracking
Accurate perception is crucial for autonomous vehicles (AVs) to navigate safely, especially in adverse weather and lighting conditions where single-sensor networks (e.g., cameras or radar) struggle with reduced maneuverability and unrecognizable targets. Deep Camera-Radar fusion neural networks offer a promising solution for reliable AV perception under any weather and lighting conditions. Cameras provide rich semantic information, while radars act like an X-ray vision, piercing through fog and darkness. This work proposes a novel, efficient Camera-Radar fusion network called NeXtFusion for robust AV perception with an improvement in object detection accuracy and tracking. Our proposed approach of utilizing an attention module enhances crucial feature representation for object detection while minimizing information loss from multi-modal data. Extensive experiments on the challenging nuScenes dataset demonstrate NeXtFusion’s superior performance in detecting small and distant objects compared to other methods. Notably, NeXtFusion achieves the highest mAP score (0.473) on the nuScenes validation set, outperforming competitors like OFT (35.1% improvement) and MonoDIS (9.5% improvement). Additionally, NeXtFusion demonstrates strong performance in other metrics like mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Furthermore, visualizations of nuScenes data processed by NeXtFusion further demonstrate its capability to handle diverse real-world scenarios. These results suggest that NeXtFusion is a promising deep fusion network for improving AV perception and safety for autonomous driving.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信