{"title":"跨视点图像地理定位的交叉注意网络","authors":"Jingjing Wang, Xi Li","doi":"10.1109/ISAS59543.2023.10164457","DOIUrl":null,"url":null,"abstract":"The task of cross-view geo-location is to get a corresponding image from a dataset of Global Positioning System (GPS) labeled aerial-view images, given a ground-view query image with an unknown location. This task presents challenges due to the significant differences in viewpoint and appearance between the two types of images. To overcome these challenges, we have developed a novel attention-based method that leverages a key localization cue. The cross-attention-based Swap Encoder Module (SEM) is proposed, which effectively aligns features by directing the network’s focus towards relevant information. Additionally, we employ an Image Proposal Network (IPN) to ensure consistent inputs of both aerial and ground-view images that correspond, during both training and validation phases. Experimental results show that our proposed network significantly outperforms existing benchmarking CVUSA dataset, with significant improvements for top-1 recall from 61.4% to 71.45%, and for top-10 from 90.49% to 92.30%.","PeriodicalId":199115,"journal":{"name":"2023 6th International Symposium on Autonomous Systems (ISAS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-Attention Network for Cross-View Image Geo-Localization\",\"authors\":\"Jingjing Wang, Xi Li\",\"doi\":\"10.1109/ISAS59543.2023.10164457\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of cross-view geo-location is to get a corresponding image from a dataset of Global Positioning System (GPS) labeled aerial-view images, given a ground-view query image with an unknown location. This task presents challenges due to the significant differences in viewpoint and appearance between the two types of images. To overcome these challenges, we have developed a novel attention-based method that leverages a key localization cue. The cross-attention-based Swap Encoder Module (SEM) is proposed, which effectively aligns features by directing the network’s focus towards relevant information. Additionally, we employ an Image Proposal Network (IPN) to ensure consistent inputs of both aerial and ground-view images that correspond, during both training and validation phases. Experimental results show that our proposed network significantly outperforms existing benchmarking CVUSA dataset, with significant improvements for top-1 recall from 61.4% to 71.45%, and for top-10 from 90.49% to 92.30%.\",\"PeriodicalId\":199115,\"journal\":{\"name\":\"2023 6th International Symposium on Autonomous Systems (ISAS)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 6th International Symposium on Autonomous Systems (ISAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISAS59543.2023.10164457\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Symposium on Autonomous Systems (ISAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAS59543.2023.10164457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cross-Attention Network for Cross-View Image Geo-Localization
The task of cross-view geo-location is to get a corresponding image from a dataset of Global Positioning System (GPS) labeled aerial-view images, given a ground-view query image with an unknown location. This task presents challenges due to the significant differences in viewpoint and appearance between the two types of images. To overcome these challenges, we have developed a novel attention-based method that leverages a key localization cue. The cross-attention-based Swap Encoder Module (SEM) is proposed, which effectively aligns features by directing the network’s focus towards relevant information. Additionally, we employ an Image Proposal Network (IPN) to ensure consistent inputs of both aerial and ground-view images that correspond, during both training and validation phases. Experimental results show that our proposed network significantly outperforms existing benchmarking CVUSA dataset, with significant improvements for top-1 recall from 61.4% to 71.45%, and for top-10 from 90.49% to 92.30%.