GeoDTR+:通过几何解缠实现通用跨视图地理定位

Xiaohan Zhang, Xingyu Li, Waqas Sultani, Chen Chen, Safwan Wshah
{"title":"GeoDTR+:通过几何解缠实现通用跨视图地理定位","authors":"Xiaohan Zhang, Xingyu Li, Waqas Sultani, Chen Chen, Safwan Wshah","doi":"10.1109/TPAMI.2024.3443652","DOIUrl":null,"url":null,"abstract":"<p><p>Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details. Our preliminary work (Zhang et al. 2022) introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA (Workman et al. 2015), CVACT (Liu and Li, 2019), and VIGOR (Zhu et al. 2021) by a large margin (16.44%, 22.71%, and 13.66% without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GeoDTR+: Toward Generic Cross-View Geolocalization via Geometric Disentanglement.\",\"authors\":\"Xiaohan Zhang, Xingyu Li, Waqas Sultani, Chen Chen, Safwan Wshah\",\"doi\":\"10.1109/TPAMI.2024.3443652\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details. Our preliminary work (Zhang et al. 2022) introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA (Workman et al. 2015), CVACT (Liu and Li, 2019), and VIGOR (Zhu et al. 2021) by a large margin (16.44%, 22.71%, and 13.66% without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+.</p>\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TPAMI.2024.3443652\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/6 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2024.3443652","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/6 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

跨视图地理定位(CVGL)通过将地面图像与数据库中带有地理标记的航空图像进行匹配,来估算地面图像的位置。最近的工作在 CVGL 基准方面取得了突出进展。然而,现有方法在跨区域评估中仍然表现不佳,在跨区域评估中,训练数据和测试数据来自完全不同的区域。我们将这一缺陷归咎于缺乏提取视觉特征几何布局的能力以及模型对低级细节的过度拟合。我们的前期工作[1]引入了几何布局提取器(GLE),从输入特征中捕捉几何布局。然而,之前的 GLE 并没有充分利用输入特征中的信息。在这项工作中,我们提出了带有增强型 GLE 模块的 GeoDTR+,该模块能更好地模拟视觉特征之间的相关性。为了充分挖掘前期工作中的 LS 技术,我们进一步提出了对比硬样本生成(CHSG)技术,以促进模型训练。广泛的实验表明,GeoDTR+ 在 CVUSA [2]、CVACT [3] 和 VIGOR [4] 的跨区域评估中以较大的优势(16.44%、22.71% 和 13.66%,无极性变换)取得了最先进的(SOTA)结果,同时保持了与现有 SOTA 相当的同区域性能。此外,我们还对 GeoDTR+ 进行了详细分析。我们的代码将发布在 https://gitlab.com/vail-uvm/geodtr_plus 网站上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GeoDTR+: Toward Generic Cross-View Geolocalization via Geometric Disentanglement.

Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Recent works achieve outstanding progress on CVGL benchmarks. However, existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details. Our preliminary work (Zhang et al. 2022) introduced a Geometric Layout Extractor (GLE) to capture the geometric layout from input features. However, the previous GLE does not fully exploit information in the input feature. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features. To fully explore the LS techniques from our preliminary work, we further propose Contrastive Hard Samples Generation (CHSG) to facilitate model training. Extensive experiments show that GeoDTR+ achieves state-of-the-art (SOTA) results in cross-area evaluation on CVUSA (Workman et al. 2015), CVACT (Liu and Li, 2019), and VIGOR (Zhu et al. 2021) by a large margin (16.44%, 22.71%, and 13.66% without polar transformation) while keeping the same-area performance comparable to existing SOTA. Moreover, we provide detailed analyses of GeoDTR+.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信