Fusion of Camera and Lidar Data for Large Scale Semantic Mapping

2019 IEEE Intelligent Transportation Systems Conference (ITSC) Pub Date : 2019-10-01 DOI:10.1109/ITSC.2019.8917107

Thomas Westfechtel, K. Ohno, R. B. Neto, Shotaro Kojima, S. Tadokoro

{"title":"Fusion of Camera and Lidar Data for Large Scale Semantic Mapping","authors":"Thomas Westfechtel, K. Ohno, R. B. Neto, Shotaro Kojima, S. Tadokoro","doi":"10.1109/ITSC.2019.8917107","DOIUrl":null,"url":null,"abstract":"Current self-driving vehicles rely on detailed maps of the environment, that contains exhaustive semantic information. This work presents a strategy to utilize the recent advancements in semantic segmentation of images, fuse the information extracted from the camera stream with accurate depth measurements of a Lidar sensor in order to create large scale semantic labeled point clouds of the environment. We fuse the color and semantic data gathered from a round-view camera system with the depth data gathered from a Lidar sensor. In our framework, each Lidar scan point is projected onto the camera stream to extract the color and semantic information while at the same time a large scale 3D map of the environment is generated by a Lidar-based SLAM algorithm. While we employed a network that achieved state of the art semantic segmentation results on the Cityscape dataset [1] (IoU score of 82.1%), the sole use of the extracted semantic information only achieved an IoU score of 38.9% on 105 manually labeled 5x5m tiles from 5 different trial runs within the Sendai city in Japan (this decrease in accuracy will discussed in section III-B). To increase the performance, we reclassify the label of each point. For this two different approaches were investigated: a random forest and SparseConvNet [2] (a deep learning approach). We investigated for both methods how the inclusion of semantic labels from the camera stream affected the classification task of the 3D point cloud. To which end we show, that a significant performance increase can be achieved by doing so - 25.4 percent points for random forest (40.0% w/o labels to 65.4% with labels) and 16.6 in case of the SparseConvNet (33.4% w/o labels to 50.8% with labels). Finally, we present practical examples on how semantic enriched maps can be employed for further tasks. In particular, we show how different classes (i.e. cars and vegetation) can be removed from the point cloud in order to increase the visibility of other classes (i.e. road and buildings). And how the data could be used for extracting the trajectories of vehicles and pedestrians.","PeriodicalId":6717,"journal":{"name":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","volume":"11 1","pages":"257-264"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Intelligent Transportation Systems Conference (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC.2019.8917107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Current self-driving vehicles rely on detailed maps of the environment, that contains exhaustive semantic information. This work presents a strategy to utilize the recent advancements in semantic segmentation of images, fuse the information extracted from the camera stream with accurate depth measurements of a Lidar sensor in order to create large scale semantic labeled point clouds of the environment. We fuse the color and semantic data gathered from a round-view camera system with the depth data gathered from a Lidar sensor. In our framework, each Lidar scan point is projected onto the camera stream to extract the color and semantic information while at the same time a large scale 3D map of the environment is generated by a Lidar-based SLAM algorithm. While we employed a network that achieved state of the art semantic segmentation results on the Cityscape dataset [1] (IoU score of 82.1%), the sole use of the extracted semantic information only achieved an IoU score of 38.9% on 105 manually labeled 5x5m tiles from 5 different trial runs within the Sendai city in Japan (this decrease in accuracy will discussed in section III-B). To increase the performance, we reclassify the label of each point. For this two different approaches were investigated: a random forest and SparseConvNet [2] (a deep learning approach). We investigated for both methods how the inclusion of semantic labels from the camera stream affected the classification task of the 3D point cloud. To which end we show, that a significant performance increase can be achieved by doing so - 25.4 percent points for random forest (40.0% w/o labels to 65.4% with labels) and 16.6 in case of the SparseConvNet (33.4% w/o labels to 50.8% with labels). Finally, we present practical examples on how semantic enriched maps can be employed for further tasks. In particular, we show how different classes (i.e. cars and vegetation) can be removed from the point cloud in order to increase the visibility of other classes (i.e. road and buildings). And how the data could be used for extracting the trajectories of vehicles and pedestrians.

查看原文本刊更多论文

面向大规模语义映射的相机与激光雷达数据融合

目前的自动驾驶汽车依赖于包含详尽语义信息的详细环境地图。这项工作提出了一种策略，利用图像语义分割的最新进展，将从相机流中提取的信息与激光雷达传感器的精确深度测量融合在一起，以创建大规模的环境语义标记点云。我们将从环视相机系统收集的颜色和语义数据与从激光雷达传感器收集的深度数据融合在一起。在我们的框架中，每个激光雷达扫描点被投影到相机流中以提取颜色和语义信息，同时通过基于激光雷达的SLAM算法生成环境的大比尺3D地图。虽然我们使用的网络在Cityscape数据集[1]上实现了最先进的语义分割结果(IoU得分为82.1%)，但提取的语义信息的唯一使用仅在日本仙台市的5个不同试验运行的105个手动标记的5x5m瓷砖上实现了38.9%的IoU得分(这种准确性的降低将在第III-B节中讨论)。为了提高性能，我们对每个点的标签进行重新分类。为此研究了两种不同的方法:随机森林和SparseConvNet[2](一种深度学习方法)。我们研究了这两种方法中包含来自相机流的语义标签如何影响3D点云的分类任务。为此，我们表明，通过这样做可以实现显着的性能提升-随机森林的25.4% (40.0% w/o标签到65.4%带标签)和SparseConvNet的16.6 (33.4% w/o标签到50.8%带标签)。最后，我们给出了一些实际的例子，说明如何将语义丰富的映射用于进一步的任务。特别是，我们展示了如何从点云中删除不同的类别(即汽车和植被)，以增加其他类别(即道路和建筑物)的可见性。以及如何利用这些数据提取车辆和行人的轨迹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Intelligent Transportation Systems Conference (ITSC)

自引率

0.00%

发文量