使用 RGB 和 RGB-D 图像进行视觉相机重定位的端到端学习框架

Measurement Science and Technology Pub Date : 2024-05-22 DOI:10.1088/1361-6501/ad4f02

Kai Zhang, Xiaolin Meng, Qing Wang

{"title":"使用 RGB 和 RGB-D 图像进行视觉相机重定位的端到端学习框架","authors":"Kai Zhang, Xiaolin Meng, Qing Wang","doi":"10.1088/1361-6501/ad4f02","DOIUrl":null,"url":null,"abstract":"\n Camera relocalization plays a vital role in the realms of machine perception, robotics, and augmented reality. Direct learning methods based on structures can have a learning-based approach that can learn scene coordinates and use them for camera position estimation. However, the two-stage learning of scene coordinate regression and camera position estimation can result in some of the scene coordinate regression knowledge being lost throughout the learning process of the final pose estimation system, thereby reducing the accuracy of the pose estimation. This paper introduces an innovative end-to-end learning framework tailored for visual camera relocalization by employing both RGB and RGB-D images. Distinguished by its integration of scene coordinate regression with pose estimation into a concurrent inner and outer loop during a singular training phase, this framework notably enhances pose estimation accuracy. Engineered for flexibility, it accommodates training with or without depth cues and necessitates merely a single RGB image during testing. Empirical evaluation substantiates the proposed method's state-of-the-art precision, attaining an average pose accuracy of 0.019m and 0.74º on the indoor 7Scenes dataset, together with 0.162m and 0.30º on the outdoor Cambridge Landmarks dataset.","PeriodicalId":510602,"journal":{"name":"Measurement Science and Technology","volume":"90 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An End-to-end Learning Framework for Visual Camera Relocalization Using RGB and RGB-D Images\",\"authors\":\"Kai Zhang, Xiaolin Meng, Qing Wang\",\"doi\":\"10.1088/1361-6501/ad4f02\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Camera relocalization plays a vital role in the realms of machine perception, robotics, and augmented reality. Direct learning methods based on structures can have a learning-based approach that can learn scene coordinates and use them for camera position estimation. However, the two-stage learning of scene coordinate regression and camera position estimation can result in some of the scene coordinate regression knowledge being lost throughout the learning process of the final pose estimation system, thereby reducing the accuracy of the pose estimation. This paper introduces an innovative end-to-end learning framework tailored for visual camera relocalization by employing both RGB and RGB-D images. Distinguished by its integration of scene coordinate regression with pose estimation into a concurrent inner and outer loop during a singular training phase, this framework notably enhances pose estimation accuracy. Engineered for flexibility, it accommodates training with or without depth cues and necessitates merely a single RGB image during testing. Empirical evaluation substantiates the proposed method's state-of-the-art precision, attaining an average pose accuracy of 0.019m and 0.74º on the indoor 7Scenes dataset, together with 0.162m and 0.30º on the outdoor Cambridge Landmarks dataset.\",\"PeriodicalId\":510602,\"journal\":{\"name\":\"Measurement Science and Technology\",\"volume\":\"90 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/1361-6501/ad4f02\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1361-6501/ad4f02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

摄像头重新定位在机器感知、机器人和增强现实等领域发挥着重要作用。基于结构的直接学习方法可以采用基于学习的方法，学习场景坐标并将其用于相机位置估计。然而，场景坐标回归和相机位置估计的两阶段学习可能会导致部分场景坐标回归知识在最终姿态估计系统的整个学习过程中丢失，从而降低姿态估计的准确性。本文介绍了一种创新的端到端学习框架，该框架采用 RGB 和 RGB-D 图像，专为视觉相机重新定位而量身定制。该框架在单一训练阶段将场景坐标回归与姿态估计整合为一个并发的内外循环，从而显著提高了姿态估计的准确性。该框架设计灵活，可在有深度线索或无深度线索的情况下进行训练，测试时只需一张 RGB 图像。实证评估证实了所提出方法的一流精度，在室内 7Scenes 数据集上达到了 0.019 米和 0.74º 的平均姿势精度，在室外剑桥地标数据集上达到了 0.162 米和 0.30º 的平均姿势精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An End-to-end Learning Framework for Visual Camera Relocalization Using RGB and RGB-D Images

Camera relocalization plays a vital role in the realms of machine perception, robotics, and augmented reality. Direct learning methods based on structures can have a learning-based approach that can learn scene coordinates and use them for camera position estimation. However, the two-stage learning of scene coordinate regression and camera position estimation can result in some of the scene coordinate regression knowledge being lost throughout the learning process of the final pose estimation system, thereby reducing the accuracy of the pose estimation. This paper introduces an innovative end-to-end learning framework tailored for visual camera relocalization by employing both RGB and RGB-D images. Distinguished by its integration of scene coordinate regression with pose estimation into a concurrent inner and outer loop during a singular training phase, this framework notably enhances pose estimation accuracy. Engineered for flexibility, it accommodates training with or without depth cues and necessitates merely a single RGB image during testing. Empirical evaluation substantiates the proposed method's state-of-the-art precision, attaining an average pose accuracy of 0.019m and 0.74º on the indoor 7Scenes dataset, together with 0.162m and 0.30º on the outdoor Cambridge Landmarks dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Measurement Science and Technology

自引率

0.00%

发文量