使用 RGB 和 RGB-D 图像进行视觉相机重定位的端到端学习框架

Kai Zhang, Xiaolin Meng, Qing Wang
{"title":"使用 RGB 和 RGB-D 图像进行视觉相机重定位的端到端学习框架","authors":"Kai Zhang, Xiaolin Meng, Qing Wang","doi":"10.1088/1361-6501/ad4f02","DOIUrl":null,"url":null,"abstract":"\n Camera relocalization plays a vital role in the realms of machine perception, robotics, and augmented reality. Direct learning methods based on structures can have a learning-based approach that can learn scene coordinates and use them for camera position estimation. However, the two-stage learning of scene coordinate regression and camera position estimation can result in some of the scene coordinate regression knowledge being lost throughout the learning process of the final pose estimation system, thereby reducing the accuracy of the pose estimation. This paper introduces an innovative end-to-end learning framework tailored for visual camera relocalization by employing both RGB and RGB-D images. Distinguished by its integration of scene coordinate regression with pose estimation into a concurrent inner and outer loop during a singular training phase, this framework notably enhances pose estimation accuracy. Engineered for flexibility, it accommodates training with or without depth cues and necessitates merely a single RGB image during testing. Empirical evaluation substantiates the proposed method's state-of-the-art precision, attaining an average pose accuracy of 0.019m and 0.74º on the indoor 7Scenes dataset, together with 0.162m and 0.30º on the outdoor Cambridge Landmarks dataset.","PeriodicalId":510602,"journal":{"name":"Measurement Science and Technology","volume":"90 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An End-to-end Learning Framework for Visual Camera Relocalization Using RGB and RGB-D Images\",\"authors\":\"Kai Zhang, Xiaolin Meng, Qing Wang\",\"doi\":\"10.1088/1361-6501/ad4f02\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Camera relocalization plays a vital role in the realms of machine perception, robotics, and augmented reality. Direct learning methods based on structures can have a learning-based approach that can learn scene coordinates and use them for camera position estimation. However, the two-stage learning of scene coordinate regression and camera position estimation can result in some of the scene coordinate regression knowledge being lost throughout the learning process of the final pose estimation system, thereby reducing the accuracy of the pose estimation. This paper introduces an innovative end-to-end learning framework tailored for visual camera relocalization by employing both RGB and RGB-D images. Distinguished by its integration of scene coordinate regression with pose estimation into a concurrent inner and outer loop during a singular training phase, this framework notably enhances pose estimation accuracy. Engineered for flexibility, it accommodates training with or without depth cues and necessitates merely a single RGB image during testing. Empirical evaluation substantiates the proposed method's state-of-the-art precision, attaining an average pose accuracy of 0.019m and 0.74º on the indoor 7Scenes dataset, together with 0.162m and 0.30º on the outdoor Cambridge Landmarks dataset.\",\"PeriodicalId\":510602,\"journal\":{\"name\":\"Measurement Science and Technology\",\"volume\":\"90 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/1361-6501/ad4f02\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1361-6501/ad4f02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

摄像头重新定位在机器感知、机器人和增强现实等领域发挥着重要作用。基于结构的直接学习方法可以采用基于学习的方法,学习场景坐标并将其用于相机位置估计。然而,场景坐标回归和相机位置估计的两阶段学习可能会导致部分场景坐标回归知识在最终姿态估计系统的整个学习过程中丢失,从而降低姿态估计的准确性。本文介绍了一种创新的端到端学习框架,该框架采用 RGB 和 RGB-D 图像,专为视觉相机重新定位而量身定制。该框架在单一训练阶段将场景坐标回归与姿态估计整合为一个并发的内外循环,从而显著提高了姿态估计的准确性。该框架设计灵活,可在有深度线索或无深度线索的情况下进行训练,测试时只需一张 RGB 图像。实证评估证实了所提出方法的一流精度,在室内 7Scenes 数据集上达到了 0.019 米和 0.74º 的平均姿势精度,在室外剑桥地标数据集上达到了 0.162 米和 0.30º 的平均姿势精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An End-to-end Learning Framework for Visual Camera Relocalization Using RGB and RGB-D Images
Camera relocalization plays a vital role in the realms of machine perception, robotics, and augmented reality. Direct learning methods based on structures can have a learning-based approach that can learn scene coordinates and use them for camera position estimation. However, the two-stage learning of scene coordinate regression and camera position estimation can result in some of the scene coordinate regression knowledge being lost throughout the learning process of the final pose estimation system, thereby reducing the accuracy of the pose estimation. This paper introduces an innovative end-to-end learning framework tailored for visual camera relocalization by employing both RGB and RGB-D images. Distinguished by its integration of scene coordinate regression with pose estimation into a concurrent inner and outer loop during a singular training phase, this framework notably enhances pose estimation accuracy. Engineered for flexibility, it accommodates training with or without depth cues and necessitates merely a single RGB image during testing. Empirical evaluation substantiates the proposed method's state-of-the-art precision, attaining an average pose accuracy of 0.019m and 0.74º on the indoor 7Scenes dataset, together with 0.162m and 0.30º on the outdoor Cambridge Landmarks dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信