基于雷达增强视觉查询的自动驾驶车辆检测

2023 IEEE Conference on Artificial Intelligence (CAI) Pub Date : 2023-06-01 DOI:10.1109/CAI54212.2023.00031

Apoorv Singh

{"title":"基于雷达增强视觉查询的自动驾驶车辆检测","authors":"Apoorv Singh","doi":"10.1109/CAI54212.2023.00031","DOIUrl":null,"url":null,"abstract":"In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.","PeriodicalId":129324,"journal":{"name":"2023 IEEE Conference on Artificial Intelligence (CAI)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Augmenting Vision Queries with RADAR for BEV Detection in Autonomous Driving\",\"authors\":\"Apoorv Singh\",\"doi\":\"10.1109/CAI54212.2023.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.\",\"PeriodicalId\":129324,\"journal\":{\"name\":\"2023 IEEE Conference on Artificial Intelligence (CAI)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Conference on Artificial Intelligence (CAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAI54212.2023.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference on Artificial Intelligence (CAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAI54212.2023.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了大规模构建自动驾驶平台，我们需要一个负担得起的传感器堆栈，提供整体场景信息，并提供足够的信息来估计动态场景的正确深度和语义。摄像头——雷达是唯一能满足上述两个条件的传感器组合，因为激光雷达太贵，而其他传感器(如超声波传感器)的探测距离极短。然而，与激光雷达与视觉工作相比较，雷达与视觉的融合工作是有限的。在本文中，我们的目标是在基于变压器的最先进的纯视觉网络中融合雷达探测到视觉的目标建议。纯视觉网络被认为可以很好地对目标进行分类，但在对被检测目标的深度估计方面存在不足。在本文中，我们假设在变压器解码器中添加雷达检测作为查询以及来自训练数据集的预学习视觉查询可以帮助提高整体召回以及检测的深度和速度估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Augmenting Vision Queries with RADAR for BEV Detection in Autonomous Driving

In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE Conference on Artificial Intelligence (CAI)

自引率

0.00%

发文量