Are These from the Same Place? Seeing the Unseen in Cross-View Image Geo-Localization

2021 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2021-01-01 DOI:10.1109/WACV48630.2021.00380

Royston Rodrigues, Masahiro Tani

{"title":"Are These from the Same Place? Seeing the Unseen in Cross-View Image Geo-Localization","authors":"Royston Rodrigues, Masahiro Tani","doi":"10.1109/WACV48630.2021.00380","DOIUrl":null,"url":null,"abstract":"In an era where digital maps act as gateways to exploring the world, the availability of large scale geo-tagged imagery has inspired a number of visual navigation techniques. One promising approach to visual navigation is cross-view image geo-localization. Here, the images whose location needs to be determined are matched against a database of geo-tagged aerial imagery. The methods based on this approach sought to resolve view point changes. But scenes also vary temporally, during which new landmarks might appear or existing ones might disappear. One cannot guarantee storage of aerial imagery across all time instants and hence a technique robust to temporal variation in scenes becomes of paramount importance. In this paper, we address the temporal gap between scenes by proposing a two step approach. First, we propose a semantically driven data augmentation technique that gives Siamese networks the ability to hallucinate unseen objects. Then we present the augmented samples to a multi-scale attentive embedding network to perform matching tasks. Experiments on standard benchmarks demonstrate the integration of the proposed approach with existing frameworks improves top-1 image recall rate on the CVUSA data-set from 89.84 % to 93.09 %, and from 81.03 % to 87.21 % on the CVACT data-set.","PeriodicalId":236300,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV48630.2021.00380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

In an era where digital maps act as gateways to exploring the world, the availability of large scale geo-tagged imagery has inspired a number of visual navigation techniques. One promising approach to visual navigation is cross-view image geo-localization. Here, the images whose location needs to be determined are matched against a database of geo-tagged aerial imagery. The methods based on this approach sought to resolve view point changes. But scenes also vary temporally, during which new landmarks might appear or existing ones might disappear. One cannot guarantee storage of aerial imagery across all time instants and hence a technique robust to temporal variation in scenes becomes of paramount importance. In this paper, we address the temporal gap between scenes by proposing a two step approach. First, we propose a semantically driven data augmentation technique that gives Siamese networks the ability to hallucinate unseen objects. Then we present the augmented samples to a multi-scale attentive embedding network to perform matching tasks. Experiments on standard benchmarks demonstrate the integration of the proposed approach with existing frameworks improves top-1 image recall rate on the CVUSA data-set from 89.84 % to 93.09 %, and from 81.03 % to 87.21 % on the CVACT data-set.

查看原文本刊更多论文

这些是从同一个地方来的吗?在交叉视点图像地理定位中看不见的东西

在数字地图作为探索世界的门户的时代，大规模地理标记图像的可用性激发了许多视觉导航技术。一种很有前途的视觉导航方法是跨视图像地理定位。在这里，需要确定位置的图像与地理标记航空图像数据库进行匹配。基于这种方法的方法试图解决观点的变化。但场景也会随着时间而变化，在此期间，新的地标可能会出现，现有的地标可能会消失。人们不能保证所有时间瞬间的航空图像存储，因此一种对场景时间变化的强大技术变得至关重要。在本文中，我们通过提出两步方法来解决场景之间的时间差距。首先，我们提出了一种语义驱动的数据增强技术，该技术使Siamese网络具有幻觉看不见的物体的能力。然后将增强后的样本提交到多尺度关注嵌入网络中执行匹配任务。在标准基准上的实验表明，将该方法与现有框架相结合，将CVUSA数据集上的top-1图像召回率从89.84%提高到93.09%，将CVACT数据集上的top-1图像召回率从81.03%提高到87.21%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量