Map-free Visual Relocalization: Metric Pose Relative to a Single Image

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-10-11 DOI:10.48550/arXiv.2210.05494

Eduardo Arnold, Jamie Wynn, S. Vicente, Guillermo Garcia-Hernando, 'Aron Monszpart, V. Prisacariu, Daniyar Turmukhambetov, Eric Brachmann

{"title":"Map-free Visual Relocalization: Metric Pose Relative to a Single Image","authors":"Eduardo Arnold, Jamie Wynn, S. Vicente, Guillermo Garcia-Hernando, 'Aron Monszpart, V. Prisacariu, Daniyar Turmukhambetov, Eric Brachmann","doi":"10.48550/arXiv.2210.05494","DOIUrl":null,"url":null,"abstract":". Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization , i.e. , using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"11 1","pages":"690-708"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.05494","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

. Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization , i.e. , using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.

查看原文本刊更多论文

无地图视觉重定位:相对于单个图像的度量姿态

。我们能否在由单个参考图像表示的场景中重新定位?标准的视觉重新定位需要数百张图像和比例校准来构建特定场景的3D地图。相比之下，我们提出了无地图重新定位，即仅使用场景的一张照片来实现即时的度量尺度重新定位。现有的数据集不适合进行基准测试，因为它们关注的是大场景或有限的可变性。因此，我们构建了一个新的数据集，其中包含655个小景点，如雕塑、壁画和喷泉，收集自世界各地。每个地方都有一个参考图像作为重新定位锚，以及数十个已知的查询图像，公制相机姿势。数据集的特点是不断变化的条件、明显的视点变化、不同地方的高度可变性，以及与参考图像的视觉重叠很少或没有重叠的查询。我们确定了两种可行的现有方法来提供基线结果:相对姿态回归和特征匹配结合单图像深度预测。虽然这些方法在我们数据集中的一些有利场景上显示出合理的性能，但无地图重新定位被证明是一个挑战，需要新的、创新的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量