GeoDTR+：通过几何解缠实现通用跨视图地理定位

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-14 DOI:10.1109/TPAMI.2024.3443652

Xiaohan Zhang;Xingyu Li;Waqas Sultani;Chen Chen;Safwan Wshah

{"title":"GeoDTR+：通过几何解缠实现通用跨视图地理定位","authors":"Xiaohan Zhang;Xingyu Li;Waqas Sultani;Chen Chen;Safwan Wshah","doi":"10.1109/TPAMI.2024.3443652","DOIUrl":null,"url":null,"abstract":"Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models’ overfitting to low-level details. Our preliminary work (Zhang et al. 2022) introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA (Workman et al. 2015), CVACT (Liu and Li, 2019), and VIGOR (Zhu et al. 2021) by a large margin (16.44%, 22.71%, and 13.66% without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"10419-10433"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GeoDTR+: Toward Generic Cross-View Geolocalization via Geometric Disentanglement\",\"authors\":\"Xiaohan Zhang;Xingyu Li;Waqas Sultani;Chen Chen;Safwan Wshah\",\"doi\":\"10.1109/TPAMI.2024.3443652\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models’ overfitting to low-level details. Our preliminary work (Zhang et al. 2022) introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA (Workman et al. 2015), CVACT (Liu and Li, 2019), and VIGOR (Zhu et al. 2021) by a large margin (16.44%, 22.71%, and 13.66% without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"46 12\",\"pages\":\"10419-10433\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10636837/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10636837/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

跨视图地理定位（CVGL）通过将地面图像与数据库中带有地理标记的航空图像进行匹配，来估算地面图像的位置。最近的工作在 CVGL 基准方面取得了突出进展。然而，现有方法在跨区域评估中仍然表现不佳，在跨区域评估中，训练数据和测试数据来自完全不同的区域。我们将这一缺陷归咎于缺乏提取视觉特征几何布局的能力以及模型对低级细节的过度拟合。我们的前期工作[1]引入了几何布局提取器（GLE），从输入特征中捕捉几何布局。然而，之前的 GLE 并没有充分利用输入特征中的信息。在这项工作中，我们提出了带有增强型 GLE 模块的 GeoDTR+，该模块能更好地模拟视觉特征之间的相关性。为了充分挖掘前期工作中的 LS 技术，我们进一步提出了对比硬样本生成（CHSG）技术，以促进模型训练。广泛的实验表明，GeoDTR+ 在 CVUSA [2]、CVACT [3] 和 VIGOR [4] 的跨区域评估中以较大的优势（16.44%、22.71% 和 13.66%，无极性变换）取得了最先进的（SOTA）结果，同时保持了与现有 SOTA 相当的同区域性能。此外，我们还对 GeoDTR+ 进行了详细分析。我们的代码将发布在 https://gitlab.com/vail-uvm/geodtr_plus 网站上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GeoDTR+: Toward Generic Cross-View Geolocalization via Geometric Disentanglement

Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models’ overfitting to low-level details. Our preliminary work (Zhang et al. 2022) introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA (Workman et al. 2015), CVACT (Liu and Li, 2019), and VIGOR (Zhu et al. 2021) by a large margin (16.44%, 22.71%, and 13.66% without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量