The potential & limitations of monoplotting in cross-view geo-localization conditions

ISPRS Open Journal of Photogrammetry and Remote Sensing Pub Date : 2025-05-23 DOI:10.1016/j.ophoto.2025.100090

Bradley J. Koskowich , Michael J. Starek , Scott A. King

{"title":"The potential & limitations of monoplotting in cross-view geo-localization conditions","authors":"Bradley J. Koskowich , Michael J. Starek , Scott A. King","doi":"10.1016/j.ophoto.2025.100090","DOIUrl":null,"url":null,"abstract":"<div><div>Cross-view geolocalization (CVGL) describes the general problem of determining a correlation between terrestrial and nadir oriented imagery. Classical keypoint matching methods find the extreme pose transitions between cameras present in a CVGL configuration challenging to operate in, while deep neural networks demonstrate superb capacity in this area. Traditional photogrammetry methods like structure-from-motion (SfM) or simultaneous localization and mapping (SLAM) can technically accomplish CVGL, but require a sufficiently dense collection of camera views in order to recover camera pose. This research proposes an alternative CVGL solution, a series of algorithmic operations which can completely automate the calculation of target camera pose via a less common photogrammetry method known as monoplotting, also called single camera resectioning. Monoplotting only requires three inputs, which are a target terrestrial camera image, a nadir-oriented image, and an underlying digital surface model. 2D-3D point correspondences are derived from the inputs to optimize for the target terrestrial camera pose. The proposed method applies affine keypointing, pixel color quantization, and keypoint neighbor triangulation to codify explicit relationships used to augment keypoint matching operations done in a CVGL context. These matching results are used to achieve better initial 2D-3D point correlations from monoplotting image pairs, resulting in lower error for single camera resectioning. To gauge the effectiveness of the proposed method, this proposed methodology is applied to urban, suburban, and natural environment datasets. This proposed methodology demonstrates an average 42x improvement in feature matching between CVGL image pairs, which improves on inconsistent baseline methodology by reducing translation errors between 50%–75%.</div></div>","PeriodicalId":100730,"journal":{"name":"ISPRS Open Journal of Photogrammetry and Remote Sensing","volume":"17 ","pages":"Article 100090"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Open Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667393225000092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cross-view geolocalization (CVGL) describes the general problem of determining a correlation between terrestrial and nadir oriented imagery. Classical keypoint matching methods find the extreme pose transitions between cameras present in a CVGL configuration challenging to operate in, while deep neural networks demonstrate superb capacity in this area. Traditional photogrammetry methods like structure-from-motion (SfM) or simultaneous localization and mapping (SLAM) can technically accomplish CVGL, but require a sufficiently dense collection of camera views in order to recover camera pose. This research proposes an alternative CVGL solution, a series of algorithmic operations which can completely automate the calculation of target camera pose via a less common photogrammetry method known as monoplotting, also called single camera resectioning. Monoplotting only requires three inputs, which are a target terrestrial camera image, a nadir-oriented image, and an underlying digital surface model. 2D-3D point correspondences are derived from the inputs to optimize for the target terrestrial camera pose. The proposed method applies affine keypointing, pixel color quantization, and keypoint neighbor triangulation to codify explicit relationships used to augment keypoint matching operations done in a CVGL context. These matching results are used to achieve better initial 2D-3D point correlations from monoplotting image pairs, resulting in lower error for single camera resectioning. To gauge the effectiveness of the proposed method, this proposed methodology is applied to urban, suburban, and natural environment datasets. This proposed methodology demonstrates an average 42x improvement in feature matching between CVGL image pairs, which improves on inconsistent baseline methodology by reducing translation errors between 50%–75%.

Abstract Image

查看原文本刊更多论文

单标绘在交叉视角地理定位条件下的潜力和局限性

交叉视图地理定位（CVGL）描述了确定地面和最低点定向图像之间相关性的一般问题。经典的关键点匹配方法发现，在CVGL配置中存在的相机之间的极端姿势转换具有挑战性，而深度神经网络在这一领域表现出卓越的能力。传统的摄影测量方法，如运动结构（SfM）或同步定位和测绘（SLAM）可以在技术上完成CVGL，但需要足够密集的相机视图集合才能恢复相机姿势。本研究提出了另一种CVGL解决方案，即一系列算法操作，通过一种不太常见的摄影测量方法，即单标绘，也称为单相机切片，可以完全自动计算目标相机姿态。单点绘图只需要三个输入，即目标地面相机图像，最低点定向图像和底层数字表面模型。从输入中导出2D-3D点对应，以优化目标地面相机姿态。提出的方法应用仿射关键点、像素颜色量化和关键点邻居三角测量来编纂用于增加在CVGL上下文中完成的关键点匹配操作的显式关系。这些匹配结果用于从单图图像对中获得更好的初始2D-3D点相关性，从而降低单相机切割的误差。为了评估该方法的有效性，将该方法应用于城市、郊区和自然环境数据集。该方法在CVGL图像对之间的特征匹配方面平均提高了42倍，通过减少50%-75%的翻译错误，改进了不一致的基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ISPRS Open Journal of Photogrammetry and Remote Sensing

CiteScore

5.10

自引率

0.00%

发文量