Bradley J. Koskowich , Michael J. Starek , Scott A. King
{"title":"The potential & limitations of monoplotting in cross-view geo-localization conditions","authors":"Bradley J. Koskowich , Michael J. Starek , Scott A. King","doi":"10.1016/j.ophoto.2025.100090","DOIUrl":null,"url":null,"abstract":"<div><div>Cross-view geolocalization (CVGL) describes the general problem of determining a correlation between terrestrial and nadir oriented imagery. Classical keypoint matching methods find the extreme pose transitions between cameras present in a CVGL configuration challenging to operate in, while deep neural networks demonstrate superb capacity in this area. Traditional photogrammetry methods like structure-from-motion (SfM) or simultaneous localization and mapping (SLAM) can technically accomplish CVGL, but require a sufficiently dense collection of camera views in order to recover camera pose. This research proposes an alternative CVGL solution, a series of algorithmic operations which can completely automate the calculation of target camera pose via a less common photogrammetry method known as monoplotting, also called single camera resectioning. Monoplotting only requires three inputs, which are a target terrestrial camera image, a nadir-oriented image, and an underlying digital surface model. 2D-3D point correspondences are derived from the inputs to optimize for the target terrestrial camera pose. The proposed method applies affine keypointing, pixel color quantization, and keypoint neighbor triangulation to codify explicit relationships used to augment keypoint matching operations done in a CVGL context. These matching results are used to achieve better initial 2D-3D point correlations from monoplotting image pairs, resulting in lower error for single camera resectioning. To gauge the effectiveness of the proposed method, this proposed methodology is applied to urban, suburban, and natural environment datasets. This proposed methodology demonstrates an average 42x improvement in feature matching between CVGL image pairs, which improves on inconsistent baseline methodology by reducing translation errors between 50%–75%.</div></div>","PeriodicalId":100730,"journal":{"name":"ISPRS Open Journal of Photogrammetry and Remote Sensing","volume":"17 ","pages":"Article 100090"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Open Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667393225000092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-view geolocalization (CVGL) describes the general problem of determining a correlation between terrestrial and nadir oriented imagery. Classical keypoint matching methods find the extreme pose transitions between cameras present in a CVGL configuration challenging to operate in, while deep neural networks demonstrate superb capacity in this area. Traditional photogrammetry methods like structure-from-motion (SfM) or simultaneous localization and mapping (SLAM) can technically accomplish CVGL, but require a sufficiently dense collection of camera views in order to recover camera pose. This research proposes an alternative CVGL solution, a series of algorithmic operations which can completely automate the calculation of target camera pose via a less common photogrammetry method known as monoplotting, also called single camera resectioning. Monoplotting only requires three inputs, which are a target terrestrial camera image, a nadir-oriented image, and an underlying digital surface model. 2D-3D point correspondences are derived from the inputs to optimize for the target terrestrial camera pose. The proposed method applies affine keypointing, pixel color quantization, and keypoint neighbor triangulation to codify explicit relationships used to augment keypoint matching operations done in a CVGL context. These matching results are used to achieve better initial 2D-3D point correlations from monoplotting image pairs, resulting in lower error for single camera resectioning. To gauge the effectiveness of the proposed method, this proposed methodology is applied to urban, suburban, and natural environment datasets. This proposed methodology demonstrates an average 42x improvement in feature matching between CVGL image pairs, which improves on inconsistent baseline methodology by reducing translation errors between 50%–75%.