{"title":"基于雷达增强视觉查询的自动驾驶车辆检测","authors":"Apoorv Singh","doi":"10.1109/CAI54212.2023.00031","DOIUrl":null,"url":null,"abstract":"In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.","PeriodicalId":129324,"journal":{"name":"2023 IEEE Conference on Artificial Intelligence (CAI)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Augmenting Vision Queries with RADAR for BEV Detection in Autonomous Driving\",\"authors\":\"Apoorv Singh\",\"doi\":\"10.1109/CAI54212.2023.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.\",\"PeriodicalId\":129324,\"journal\":{\"name\":\"2023 IEEE Conference on Artificial Intelligence (CAI)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Conference on Artificial Intelligence (CAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAI54212.2023.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Conference on Artificial Intelligence (CAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAI54212.2023.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Augmenting Vision Queries with RADAR for BEV Detection in Autonomous Driving
In order to build an autonomous driving platform at scale, we need to have an affordable sensor stack that provides holistic scene information with just enough information for estimating right depth and semantics of the dynamic scene. Cameras - RADARs came out to be the only combination of the sensor stack to fulfill above two conditions, since LiDARs are too expensive and other sensors like Ultra-sound sensors have extremely short range. However, there is a limited work around radar fused with vision, compared to LiDAR fused with vision work. In this paper we target to fuse RADAR detections to the vision’s object-proposals in the transformers-based state-of-the-art Vision-only networks. Vision-only networks are hypothesized to classify objects very well but they lack behind in depth estimation of the detected objects. In this paper, we hypothesize that adding in radar detections as a query in a transformers decoder along with the pre-learned vision queries from the training data-set can help improving overall recall as well as depth and velocity estimates of the detections.