{"title":"3D Object Detection and Tracking Using Monocular Camera in CARLA","authors":"Yanyu Zhang, Jiahao Song, Shuwei Li","doi":"10.1109/EIT51626.2021.9491905","DOIUrl":null,"url":null,"abstract":"Vehicle 3D extents and trajectories are crucial cues for many autonomous driving tasks such as path planning, motion prediction, etc [1]. This paper proposes a novel online deep learning framework to tackle the 3D object detection and track problem using a monocular camera. The framework can estimate the complete 3D information from a sequence of 2D images and associate the objects over time. By obtaining continuous frames from the front camera, our network robustly tracks the 3D bounding boxes for each observation and provides its location P with orientation θ, dimension D and 2D projection of its 3D center c. The training dataset is generated from the CARLA simulator and trained using Faster R-CNN [2] on a 2080 Super GPU. The 2D vehicles and centers test accuracy reaches 95% and 3D tracking performance can reach 81% MOTP [3] in CARLA [4] environment.","PeriodicalId":162816,"journal":{"name":"2021 IEEE International Conference on Electro Information Technology (EIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Electro Information Technology (EIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EIT51626.2021.9491905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Vehicle 3D extents and trajectories are crucial cues for many autonomous driving tasks such as path planning, motion prediction, etc [1]. This paper proposes a novel online deep learning framework to tackle the 3D object detection and track problem using a monocular camera. The framework can estimate the complete 3D information from a sequence of 2D images and associate the objects over time. By obtaining continuous frames from the front camera, our network robustly tracks the 3D bounding boxes for each observation and provides its location P with orientation θ, dimension D and 2D projection of its 3D center c. The training dataset is generated from the CARLA simulator and trained using Faster R-CNN [2] on a 2080 Super GPU. The 2D vehicles and centers test accuracy reaches 95% and 3D tracking performance can reach 81% MOTP [3] in CARLA [4] environment.
车辆的3D范围和轨迹是许多自动驾驶任务的关键线索,如路径规划、运动预测等[1]。本文提出了一种新的在线深度学习框架来解决使用单目相机的三维目标检测和跟踪问题。该框架可以从一系列2D图像中估计完整的3D信息,并随着时间的推移将对象关联起来。通过从前置摄像头获取连续帧,我们的网络稳健地跟踪每个观测点的3D边界框,并为其位置P提供方向θ、维度D和3D中心c的2D投影。训练数据集由CARLA模拟器生成,并在2080 Super GPU上使用Faster R-CNN[2]进行训练。在CARLA[4]环境下,二维车辆和中心测试精度达到95%,三维跟踪性能达到81% MOTP[3]。