HyperMap:用于单目相机配准的压缩3D地图

2021 IEEE International Conference on Robotics and Automation (ICRA) Pub Date : 2021-05-30 DOI:10.1109/ICRA48506.2021.9561864

Ming-Fang Chang, Joshua G. Mangelson, M. Kaess, S. Lucey

{"title":"HyperMap:用于单目相机配准的压缩3D地图","authors":"Ming-Fang Chang, Joshua G. Mangelson, M. Kaess, S. Lucey","doi":"10.1109/ICRA48506.2021.9561864","DOIUrl":null,"url":null,"abstract":"We address the problem of image registration to a compressed 3D map. While this is most often performed by comparing LiDAR scans to the point cloud based map, it depends on an expensive LiDAR sensor at run time and the large point cloud based map creates overhead in data storage and transmission. Recently, efforts have been underway to replace the expensive LiDAR sensor with cheaper cameras and perform 2D-3D localization. In contrast to the previous work that learns relative pose by comparing projected depth and camera images, we propose HyperMap, a paradigm shift from online depth map feature extraction to offline 3D map feature computation for the 2D-3D camera registration task through end-to-end training. In the proposed pipeline, we first perform offline 3D sparse convolution to extract and compress the voxelwise hypercolumn features for the whole map. Then at run-time, we project and decode the compressed map features to the rough initial camera pose to form a virtual feature image. A Convolutional Neural Network (CNN) is then used to predict the relative pose between the camera image and the virtual feature image. In addition, we propose an efficient occlusion handling layer, specifically designed for large point clouds, to remove occluded points in projection. Our experiments on synthetic and real datasets show that, by moving the feature computation load offline and compressing, we reduced map size by 87−94% while maintaining comparable or better accuracy.","PeriodicalId":108312,"journal":{"name":"2021 IEEE International Conference on Robotics and Automation (ICRA)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"HyperMap: Compressed 3D Map for Monocular Camera Registration\",\"authors\":\"Ming-Fang Chang, Joshua G. Mangelson, M. Kaess, S. Lucey\",\"doi\":\"10.1109/ICRA48506.2021.9561864\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the problem of image registration to a compressed 3D map. While this is most often performed by comparing LiDAR scans to the point cloud based map, it depends on an expensive LiDAR sensor at run time and the large point cloud based map creates overhead in data storage and transmission. Recently, efforts have been underway to replace the expensive LiDAR sensor with cheaper cameras and perform 2D-3D localization. In contrast to the previous work that learns relative pose by comparing projected depth and camera images, we propose HyperMap, a paradigm shift from online depth map feature extraction to offline 3D map feature computation for the 2D-3D camera registration task through end-to-end training. In the proposed pipeline, we first perform offline 3D sparse convolution to extract and compress the voxelwise hypercolumn features for the whole map. Then at run-time, we project and decode the compressed map features to the rough initial camera pose to form a virtual feature image. A Convolutional Neural Network (CNN) is then used to predict the relative pose between the camera image and the virtual feature image. In addition, we propose an efficient occlusion handling layer, specifically designed for large point clouds, to remove occluded points in projection. Our experiments on synthetic and real datasets show that, by moving the feature computation load offline and compressing, we reduced map size by 87−94% while maintaining comparable or better accuracy.\",\"PeriodicalId\":108312,\"journal\":{\"name\":\"2021 IEEE International Conference on Robotics and Automation (ICRA)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Robotics and Automation (ICRA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRA48506.2021.9561864\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA48506.2021.9561864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

我们解决了图像配准到压缩3D地图的问题。虽然这通常是通过将激光雷达扫描与基于点云的地图进行比较来完成的，但它在运行时依赖于昂贵的激光雷达传感器，而且大型基于点云的地图会增加数据存储和传输的开销。最近，人们正在努力用更便宜的相机取代昂贵的激光雷达传感器，并进行2D-3D定位。与之前通过比较投影深度和相机图像来学习相对姿态的工作不同，我们提出了HyperMap，这是一种从在线深度图特征提取到离线3D地图特征计算的范式转换，通过端到端训练来完成2D-3D相机配准任务。在提出的管道中，我们首先进行离线3D稀疏卷积，以提取和压缩整个地图的体素超列特征。然后在运行时，将压缩后的地图特征投影解码到粗糙的初始相机姿态，形成虚拟特征图像。然后使用卷积神经网络(CNN)来预测相机图像和虚拟特征图像之间的相对姿态。此外，我们提出了一个有效的遮挡处理层，专门为大型点云设计，以去除投影中的遮挡点。我们在合成和真实数据集上的实验表明，通过将特征计算负载移到离线并压缩，我们将地图大小减少了87 - 94%，同时保持相当或更好的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HyperMap: Compressed 3D Map for Monocular Camera Registration

We address the problem of image registration to a compressed 3D map. While this is most often performed by comparing LiDAR scans to the point cloud based map, it depends on an expensive LiDAR sensor at run time and the large point cloud based map creates overhead in data storage and transmission. Recently, efforts have been underway to replace the expensive LiDAR sensor with cheaper cameras and perform 2D-3D localization. In contrast to the previous work that learns relative pose by comparing projected depth and camera images, we propose HyperMap, a paradigm shift from online depth map feature extraction to offline 3D map feature computation for the 2D-3D camera registration task through end-to-end training. In the proposed pipeline, we first perform offline 3D sparse convolution to extract and compress the voxelwise hypercolumn features for the whole map. Then at run-time, we project and decode the compressed map features to the rough initial camera pose to form a virtual feature image. A Convolutional Neural Network (CNN) is then used to predict the relative pose between the camera image and the virtual feature image. In addition, we propose an efficient occlusion handling layer, specifically designed for large point clouds, to remove occluded points in projection. Our experiments on synthetic and real datasets show that, by moving the feature computation load offline and compressing, we reduced map size by 87−94% while maintaining comparable or better accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Robotics and Automation (ICRA)

自引率

0.00%

发文量