回归:单目3D车辆定位的两个阶段

Robotics: Science and Systems XV Pub Date : 2019-06-22 DOI:10.15607/RSS.2019.XV.016

Jaesung Choe, Kyungdon Joo, François Rameau, Gyumin Shim, I. Kweon

{"title":"回归:单目3D车辆定位的两个阶段","authors":"Jaesung Choe, Kyungdon Joo, François Rameau, Gyumin Shim, I. Kweon","doi":"10.15607/RSS.2019.XV.016","DOIUrl":null,"url":null,"abstract":"High-quality depth information is required to perform 3D vehicle detection, consequently, there exists a large performance gap between camera and LiDAR-based approaches. In this paper, our monocular camera-based 3D vehicle localization method alleviates the dependency on high-quality depth maps by taking advantage of the commonly accepted assumption that the observed vehicles lie on the road surface. We propose a two-stage approach that consists of a segment network and a regression network, called Segment2Regress. For a given single RGB image and a prior 2D object detection bounding box, the two stages are as follows: 1) The segment network activates the pixels under the vehicle (modeled as four line segments and a quadrilateral representing the area beneath the vehicle projected on the image coordinate). These segments are trained to lie on the road plane such that our network does not require full depth estimation. Instead, the depth is directly approximated from the known ground plane parameters. 2) The regression network takes the segments fused with the plane depth to predict the 3D location of a car at the ground level. To stabilize the regression, we introduce a coupling loss that enforces structural constraints. The efficiency, accuracy, and robustness of the proposed technique are highlighted through a series of experiments and ablation assessments. These tests are conducted on the KITTI bird’s eye view dataset where Segment2Regress demonstrates state-of-the-art performance. Further results are available at https://github.com/LifeBeyondExpectations/Segment2Regress","PeriodicalId":307591,"journal":{"name":"Robotics: Science and Systems XV","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Segment2Regress: Monocular 3D Vehicle Localization in Two Stages\",\"authors\":\"Jaesung Choe, Kyungdon Joo, François Rameau, Gyumin Shim, I. Kweon\",\"doi\":\"10.15607/RSS.2019.XV.016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-quality depth information is required to perform 3D vehicle detection, consequently, there exists a large performance gap between camera and LiDAR-based approaches. In this paper, our monocular camera-based 3D vehicle localization method alleviates the dependency on high-quality depth maps by taking advantage of the commonly accepted assumption that the observed vehicles lie on the road surface. We propose a two-stage approach that consists of a segment network and a regression network, called Segment2Regress. For a given single RGB image and a prior 2D object detection bounding box, the two stages are as follows: 1) The segment network activates the pixels under the vehicle (modeled as four line segments and a quadrilateral representing the area beneath the vehicle projected on the image coordinate). These segments are trained to lie on the road plane such that our network does not require full depth estimation. Instead, the depth is directly approximated from the known ground plane parameters. 2) The regression network takes the segments fused with the plane depth to predict the 3D location of a car at the ground level. To stabilize the regression, we introduce a coupling loss that enforces structural constraints. The efficiency, accuracy, and robustness of the proposed technique are highlighted through a series of experiments and ablation assessments. These tests are conducted on the KITTI bird’s eye view dataset where Segment2Regress demonstrates state-of-the-art performance. Further results are available at https://github.com/LifeBeyondExpectations/Segment2Regress\",\"PeriodicalId\":307591,\"journal\":{\"name\":\"Robotics: Science and Systems XV\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics: Science and Systems XV\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15607/RSS.2019.XV.016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XV","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/RSS.2019.XV.016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

3D车辆检测需要高质量的深度信息，因此，基于相机的方法与基于激光雷达的方法在性能上存在较大差距。在本文中，我们基于单目摄像机的3D车辆定位方法利用了人们普遍接受的假设，即观察到的车辆位于路面上，从而减轻了对高质量深度图的依赖。我们提出了一种两阶段的方法，由一个分段网络和一个回归网络组成，称为Segment2Regress。对于给定的单个RGB图像和先前的2D目标检测边界框，两个阶段如下:1)段网络激活车辆下方的像素(建模为四条线段和一个四边形，表示图像坐标上投影的车辆下方区域)。这些路段被训练成位于道路平面上，这样我们的网络就不需要全深度估计。相反，深度是直接从已知的地平面参数近似。2)回归网络将路段与平面深度融合，预测汽车在地面的三维位置。为了稳定回归，我们引入了一个耦合损失来加强结构约束。通过一系列的实验和消融评估，强调了所提出技术的效率、准确性和鲁棒性。这些测试是在KITTI鸟瞰图数据集上进行的，其中segment2regression展示了最先进的性能。进一步的结果可以在https://github.com/LifeBeyondExpectations/Segment2Regress上找到

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Segment2Regress: Monocular 3D Vehicle Localization in Two Stages

High-quality depth information is required to perform 3D vehicle detection, consequently, there exists a large performance gap between camera and LiDAR-based approaches. In this paper, our monocular camera-based 3D vehicle localization method alleviates the dependency on high-quality depth maps by taking advantage of the commonly accepted assumption that the observed vehicles lie on the road surface. We propose a two-stage approach that consists of a segment network and a regression network, called Segment2Regress. For a given single RGB image and a prior 2D object detection bounding box, the two stages are as follows: 1) The segment network activates the pixels under the vehicle (modeled as four line segments and a quadrilateral representing the area beneath the vehicle projected on the image coordinate). These segments are trained to lie on the road plane such that our network does not require full depth estimation. Instead, the depth is directly approximated from the known ground plane parameters. 2) The regression network takes the segments fused with the plane depth to predict the 3D location of a car at the ground level. To stabilize the regression, we introduce a coupling loss that enforces structural constraints. The efficiency, accuracy, and robustness of the proposed technique are highlighted through a series of experiments and ablation assessments. These tests are conducted on the KITTI bird’s eye view dataset where Segment2Regress demonstrates state-of-the-art performance. Further results are available at https://github.com/LifeBeyondExpectations/Segment2Regress

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics: Science and Systems XV

自引率

0.00%

发文量