CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-10-17 DOI:10.48550/arXiv.2210.09267

Jyh-Jing Hwang, Henrik Kretzschmar, Joshua M. Manela, Sean M. Rafferty, N. Armstrong-Crews, Tiffany Chen, Drago Anguelov

{"title":"CramNet: Camera-Radar Fusion with Ray-Constrained Cross-Attention for Robust 3D Object Detection","authors":"Jyh-Jing Hwang, Henrik Kretzschmar, Joshua M. Manela, Sean M. Rafferty, N. Armstrong-Crews, Tiffany Chen, Drago Anguelov","doi":"10.48550/arXiv.2210.09267","DOIUrl":null,"url":null,"abstract":"Robust 3D object detection is critical for safe autonomous driving. Camera and radar sensors are synergistic as they capture complementary information and work well under different environmental conditions. Fusing camera and radar data is challenging, however, as each of the sensors lacks information along a perpendicular axis, that is, depth is unknown to camera and elevation is unknown to radar. We propose the camera-radar matching network CramNet, an efficient approach to fuse the sensor readings from camera and radar in a joint 3D space. To leverage radar range measurements for better camera depth predictions, we propose a novel ray-constrained cross-attention mechanism that resolves the ambiguity in the geometric correspondences between camera features and radar features. Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle. We demonstrate the effectiveness of our fusion approach through extensive experiments on the RADIATE dataset, one of the few large-scale datasets that provide radar radio frequency imagery. A camera-only variant of our method achieves competitive performance in monocular 3D object detection on the Waymo Open Dataset.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"11 1","pages":"388-405"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.09267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Robust 3D object detection is critical for safe autonomous driving. Camera and radar sensors are synergistic as they capture complementary information and work well under different environmental conditions. Fusing camera and radar data is challenging, however, as each of the sensors lacks information along a perpendicular axis, that is, depth is unknown to camera and elevation is unknown to radar. We propose the camera-radar matching network CramNet, an efficient approach to fuse the sensor readings from camera and radar in a joint 3D space. To leverage radar range measurements for better camera depth predictions, we propose a novel ray-constrained cross-attention mechanism that resolves the ambiguity in the geometric correspondences between camera features and radar features. Our method supports training with sensor modality dropout, which leads to robust 3D object detection, even when a camera or radar sensor suddenly malfunctions on a vehicle. We demonstrate the effectiveness of our fusion approach through extensive experiments on the RADIATE dataset, one of the few large-scale datasets that provide radar radio frequency imagery. A camera-only variant of our method achieves competitive performance in monocular 3D object detection on the Waymo Open Dataset.

查看原文本刊更多论文

相机-雷达融合与光线约束交叉注意鲁棒三维目标检测

鲁棒的3D目标检测对于安全的自动驾驶至关重要。相机和雷达传感器是协同的，因为它们捕获互补的信息，在不同的环境条件下都能很好地工作。然而，融合相机和雷达数据具有挑战性，因为每个传感器都缺乏垂直轴上的信息，也就是说，相机不知道深度，雷达不知道仰角。我们提出了相机-雷达匹配网络CramNet，这是一种在联合三维空间中融合相机和雷达传感器读数的有效方法。为了利用雷达距离测量来更好地预测相机深度，我们提出了一种新的射线约束交叉注意机制，该机制解决了相机特征和雷达特征之间几何对应关系的模糊性。我们的方法支持传感器模态dropout训练，即使在车辆上的相机或雷达传感器突然发生故障时，也能实现鲁棒的3D物体检测。我们通过在辐射数据集(为数不多的提供雷达射频图像的大型数据集之一)上的大量实验证明了我们的融合方法的有效性。在Waymo开放数据集上，我们的方法的一个仅用于相机的变体在单目3D物体检测方面取得了具有竞争力的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量