大规模3D模型对于精确的视觉定位真的是必要的吗?

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-21 DOI:10.1109/CVPR.2017.654

Torsten Sattler, A. Torii, Josef Sivic, M. Pollefeys, Hajime Taira, M. Okutomi, T. Pajdla

{"title":"大规模3D模型对于精确的视觉定位真的是必要的吗?","authors":"Torsten Sattler, A. Torii, Josef Sivic, M. Pollefeys, Hajime Taira, M. Okutomi, T. Pajdla","doi":"10.1109/CVPR.2017.654","DOIUrl":null,"url":null,"abstract":"Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6DOF pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct and to maintain. They are often considered inaccurate since they only approximate the positions of the cameras. Yet, the exact camera pose can theoretically be recovered when enough relevant database images are retrieved. In this paper, we demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a pose accuracy similar to the state-of-the-art structure-based methods. Our results suggest that we might want to reconsider the current approach for accurate large-scale localization.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"299 1","pages":"6175-6184"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?\",\"authors\":\"Torsten Sattler, A. Torii, Josef Sivic, M. Pollefeys, Hajime Taira, M. Okutomi, T. Pajdla\",\"doi\":\"10.1109/CVPR.2017.654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6DOF pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct and to maintain. They are often considered inaccurate since they only approximate the positions of the cameras. Yet, the exact camera pose can theoretically be recovered when enough relevant database images are retrieved. In this paper, we demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a pose accuracy similar to the state-of-the-art structure-based methods. Our results suggest that we might want to reconsider the current approach for accurate large-scale localization.\",\"PeriodicalId\":6631,\"journal\":{\"name\":\"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)\",\"volume\":\"299 1\",\"pages\":\"6175-6184\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2017.654\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2017.654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

准确的视觉定位是自主导航的关键技术。基于3D结构的方法采用场景的3D模型来非常准确地估计相机的完整6DOF姿态。然而，构建(和扩展)大规模3D模型仍然是一个重大挑战。相比之下，基于二维图像检索的方法只需要一个地理标记图像的数据库，该数据库的构建和维护都很简单。它们通常被认为是不准确的，因为它们只是近似相机的位置。然而，理论上，当检索到足够的相关数据库图像时，可以恢复准确的相机姿势。在本文中，我们通过实验证明了大规模的三维模型对于精确的视觉定位并不是严格必需的。我们为大型且具有挑战性的城市数据集创建参考姿势。使用这些姿态，我们表明，将基于图像的方法与局部重建相结合，可以获得与最先进的基于结构的方法相似的姿态精度。我们的结果表明，我们可能需要重新考虑当前精确的大规模定位方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?

Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6DOF pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct and to maintain. They are often considered inaccurate since they only approximate the positions of the cameras. Yet, the exact camera pose can theoretically be recovered when enough relevant database images are retrieved. In this paper, we demonstrate experimentally that large-scale 3D models are not strictly necessary for accurate visual localization. We create reference poses for a large and challenging urban dataset. Using these poses, we show that combining image-based methods with local reconstructions results in a pose accuracy similar to the state-of-the-art structure-based methods. Our results suggest that we might want to reconsider the current approach for accurate large-scale localization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量