基于雷达增强视觉查询的自动驾驶车辆检测

Apoorv Singh
{"title":"基于雷达增强视觉查询的自动驾驶车辆检测","authors":"Apoorv Singh","doi":"10.1109/CAI54212.2023.00031","DOIUrl":null,"url":null,"abstract":"In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.","PeriodicalId":129324,"journal":{"name":"2023 IEEE Conference on Artificial Intelligence (CAI)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Augmenting Vision Queries with RADAR for BEV Detection in Autonomous Driving\",\"authors\":\"Apoorv Singh\",\"doi\":\"10.1109/CAI54212.2023.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.\",\"PeriodicalId\":129324,\"journal\":{\"name\":\"2023 IEEE Conference on Artificial Intelligence (CAI)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Conference on Artificial Intelligence (CAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAI54212.2023.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference on Artificial Intelligence (CAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAI54212.2023.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

为了大规模构建自动驾驶平台,我们需要一个负担得起的传感器堆栈,提供整体场景信息,并提供足够的信息来估计动态场景的正确深度和语义。摄像头——雷达是唯一能满足上述两个条件的传感器组合,因为激光雷达太贵,而其他传感器(如超声波传感器)的探测距离极短。然而,与激光雷达与视觉工作相比较,雷达与视觉的融合工作是有限的。在本文中,我们的目标是在基于变压器的最先进的纯视觉网络中融合雷达探测到视觉的目标建议。纯视觉网络被认为可以很好地对目标进行分类,但在对被检测目标的深度估计方面存在不足。在本文中,我们假设在变压器解码器中添加雷达检测作为查询以及来自训练数据集的预学习视觉查询可以帮助提高整体召回以及检测的深度和速度估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Augmenting Vision Queries with RADAR for BEV Detection in Autonomous Driving
In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信