Learning to Detect Scene Landmarks for Camera Localization

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2022-06-01 DOI:10.1109/CVPR52688.2022.01085

Tien Do, O. Mikšík, Joseph DeGol, Hyunjong Park, Sudipta N. Sinha

{"title":"Learning to Detect Scene Landmarks for Camera Localization","authors":"Tien Do, O. Mikšík, Joseph DeGol, Hyunjong Park, Sudipta N. Sinha","doi":"10.1109/CVPR52688.2022.01085","DOIUrl":null,"url":null,"abstract":"Modern camera localization methods that use image retrieval, feature matching, and 3D structure-based pose estimation require long-term storage of numerous scene images or a vast amount of image features. This can make them unsuitable for resource constrained VR/AR devices and also raises serious privacy concerns. We present a new learned camera localization technique that eliminates the need to store features or a detailed 3D point cloud. Our key idea is to implicitly encode the appearance of a sparse yet salient set of 3D scene points into a convolutional neural network (CNN) that can detect these scene points in query images whenever they are visible. We refer to these points as scene landmarks. We also show that a CNN can be trained to regress bearing vectors for such landmarks even when they are not within the camera's field-of-view. We demonstrate that the predicted landmarks yield accurate pose estimates and that our method outperforms DSAC*, the state-of-the-art in learned localization. Furthermore, extending HLoc (an accurate method) by combining its correspondences with our predictions boosts its accuracy even further.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52688.2022.01085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Modern camera localization methods that use image retrieval, feature matching, and 3D structure-based pose estimation require long-term storage of numerous scene images or a vast amount of image features. This can make them unsuitable for resource constrained VR/AR devices and also raises serious privacy concerns. We present a new learned camera localization technique that eliminates the need to store features or a detailed 3D point cloud. Our key idea is to implicitly encode the appearance of a sparse yet salient set of 3D scene points into a convolutional neural network (CNN) that can detect these scene points in query images whenever they are visible. We refer to these points as scene landmarks. We also show that a CNN can be trained to regress bearing vectors for such landmarks even when they are not within the camera's field-of-view. We demonstrate that the predicted landmarks yield accurate pose estimates and that our method outperforms DSAC*, the state-of-the-art in learned localization. Furthermore, extending HLoc (an accurate method) by combining its correspondences with our predictions boosts its accuracy even further.

查看原文本刊更多论文

学习检测场景地标相机定位

使用图像检索、特征匹配和基于3D结构的姿态估计的现代相机定位方法需要长期存储大量场景图像或大量图像特征。这可能会使它们不适合资源受限的VR/AR设备，并且还会引起严重的隐私问题。我们提出了一种新的学习相机定位技术，消除了存储特征或详细的3D点云的需要。我们的关键思想是隐式编码一组稀疏但显著的3D场景点的外观到卷积神经网络(CNN)中，该网络可以在查询图像中检测这些场景点，无论它们是可见的。我们把这些点称为场景地标。我们还表明，CNN可以被训练来回归这些地标的方位向量，即使它们不在相机的视野范围内。我们证明，预测的地标产生准确的姿态估计，我们的方法优于DSAC*，最先进的学习定位。此外，通过将HLoc的对应性与我们的预测相结合来扩展HLoc(一种准确的方法)，进一步提高了其准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量