PL-Pose：利用控制图像，基于点和线的组合特征进行稳健的相机定位

The Photogrammetric Record Pub Date : 2024-02-28 DOI:10.1111/phor.12481

Zhihua Xu, Yiru Niu, Yan Cui, Rongjun Qin, Wenbin Sun

{"title":"PL-Pose：利用控制图像，基于点和线的组合特征进行稳健的相机定位","authors":"Zhihua Xu, Yiru Niu, Yan Cui, Rongjun Qin, Wenbin Sun","doi":"10.1111/phor.12481","DOIUrl":null,"url":null,"abstract":"Camera localisation is an essential task in the field of computer vision. The objective is to determine the precise position and orientation of one newly introduced camera station based on a collection of control images that are geographically referenced. Traditional feature‐based approaches have been found to face difficulties when confronted with the task of localising images that exhibit significant disparities in viewpoint. Modern deep learning approaches, on the contrary, aim to directly regress camera poses from input image content, being holistic to remedy the problem of viewpoint disparities. This paper posits that although deep networks possess the ability to learn robust and invariant visual features, the incorporation of geometry models can provide rigorous constraints in the process of pose estimation. Following the classic structure‐from‐motion (SfM) pipeline, we propose a PL‐Pose framework to perform camera localisation. First, to improve feature correlations for images with large viewpoint disparities, we perform the combination of point and line features based on a deep learning framework and geometric relation of wireframes. Then, a cost function is constructed using the combined point and line features in order to impose constraints on the bundle adjustment process. Finally, the camera pose parameters and 3D points are estimated through an iterative optimisation process. We verify the accuracy of the PL‐Pose approach through the utilisation of two datasets, that is, the publicly available S3DIS dataset and the self‐collected dataset CUMTB_Campus. The experimental results demonstrate that in both indoor and outdoor scenes, our PL‐Pose method can achieve localisation errors of less than 1 m for 82% of the test points. In contrast, the other four comparison methods yield a best result of merely 72%. Meanwhile, the PL‐Pose method can successfully obtain the camera pose parameters in all the scenes with small or large viewpoint disparities, indicating its good stability and adaptability.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PL‐Pose: robust camera localisation based on combined point and line features using control images\",\"authors\":\"Zhihua Xu, Yiru Niu, Yan Cui, Rongjun Qin, Wenbin Sun\",\"doi\":\"10.1111/phor.12481\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Camera localisation is an essential task in the field of computer vision. The objective is to determine the precise position and orientation of one newly introduced camera station based on a collection of control images that are geographically referenced. Traditional feature‐based approaches have been found to face difficulties when confronted with the task of localising images that exhibit significant disparities in viewpoint. Modern deep learning approaches, on the contrary, aim to directly regress camera poses from input image content, being holistic to remedy the problem of viewpoint disparities. This paper posits that although deep networks possess the ability to learn robust and invariant visual features, the incorporation of geometry models can provide rigorous constraints in the process of pose estimation. Following the classic structure‐from‐motion (SfM) pipeline, we propose a PL‐Pose framework to perform camera localisation. First, to improve feature correlations for images with large viewpoint disparities, we perform the combination of point and line features based on a deep learning framework and geometric relation of wireframes. Then, a cost function is constructed using the combined point and line features in order to impose constraints on the bundle adjustment process. Finally, the camera pose parameters and 3D points are estimated through an iterative optimisation process. We verify the accuracy of the PL‐Pose approach through the utilisation of two datasets, that is, the publicly available S3DIS dataset and the self‐collected dataset CUMTB_Campus. The experimental results demonstrate that in both indoor and outdoor scenes, our PL‐Pose method can achieve localisation errors of less than 1 m for 82% of the test points. In contrast, the other four comparison methods yield a best result of merely 72%. Meanwhile, the PL‐Pose method can successfully obtain the camera pose parameters in all the scenes with small or large viewpoint disparities, indicating its good stability and adaptability.\",\"PeriodicalId\":22881,\"journal\":{\"name\":\"The Photogrammetric Record\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Photogrammetric Record\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/phor.12481\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Photogrammetric Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/phor.12481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

相机定位是计算机视觉领域的一项基本任务。其目的是根据一组以地理位置为参照的控制图像，确定一个新引入相机站的精确位置和方向。传统的基于特征的方法在面对视角差异较大的图像定位任务时会遇到困难。与此相反，现代深度学习方法旨在从输入图像内容直接回归相机姿势，从整体上解决视角差异问题。本文认为，虽然深度网络具有学习稳健不变的视觉特征的能力，但在姿势估计过程中，几何模型的加入可以提供严格的约束。按照经典的结构-运动（SfM）管道，我们提出了一个 PL-Pose 框架来执行相机定位。首先，为了提高视角差异较大的图像的特征相关性，我们基于深度学习框架和线框的几何关系，对点和线特征进行了组合。然后，利用组合的点和线特征构建成本函数，以便对捆绑调整过程施加约束。最后，通过迭代优化过程估算相机姿态参数和三维点。我们利用两个数据集（即公开的 S3DIS 数据集和自行收集的 CUMTB_Campus 数据集）验证了 PL-Pose 方法的准确性。实验结果表明，在室内和室外场景中，我们的 PL-Pose 方法可以使 82% 的测试点的定位误差小于 1 米。相比之下，其他四种对比方法的最佳结果仅为 72%。同时，PL-Pose 方法能在所有视角差异较小或较大的场景中成功获得摄像机姿态参数，这表明它具有良好的稳定性和适应性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PL‐Pose: robust camera localisation based on combined point and line features using control images

Camera localisation is an essential task in the field of computer vision. The objective is to determine the precise position and orientation of one newly introduced camera station based on a collection of control images that are geographically referenced. Traditional feature‐based approaches have been found to face difficulties when confronted with the task of localising images that exhibit significant disparities in viewpoint. Modern deep learning approaches, on the contrary, aim to directly regress camera poses from input image content, being holistic to remedy the problem of viewpoint disparities. This paper posits that although deep networks possess the ability to learn robust and invariant visual features, the incorporation of geometry models can provide rigorous constraints in the process of pose estimation. Following the classic structure‐from‐motion (SfM) pipeline, we propose a PL‐Pose framework to perform camera localisation. First, to improve feature correlations for images with large viewpoint disparities, we perform the combination of point and line features based on a deep learning framework and geometric relation of wireframes. Then, a cost function is constructed using the combined point and line features in order to impose constraints on the bundle adjustment process. Finally, the camera pose parameters and 3D points are estimated through an iterative optimisation process. We verify the accuracy of the PL‐Pose approach through the utilisation of two datasets, that is, the publicly available S3DIS dataset and the self‐collected dataset CUMTB_Campus. The experimental results demonstrate that in both indoor and outdoor scenes, our PL‐Pose method can achieve localisation errors of less than 1 m for 82% of the test points. In contrast, the other four comparison methods yield a best result of merely 72%. Meanwhile, the PL‐Pose method can successfully obtain the camera pose parameters in all the scenes with small or large viewpoint disparities, indicating its good stability and adaptability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Photogrammetric Record

自引率

0.00%

发文量