{"title":"Learning Robust Feature Representation for Cross-View Image Geo-Localization","authors":"Wenjian Gan;Yang Zhou;Xiaofei Hu;Luying Zhao;Gaoshuang Huang;Mingbo Hou","doi":"10.1109/LGRS.2025.3543949","DOIUrl":null,"url":null,"abstract":"The cross-view image geo-localization (CVGL) refers to determining the geographic location of a given query image using an image database with the known location information. Existing methods mainly focus on learning discriminative image representations to optimize the distance of image feature representations in feature space without fully considering the positional relation information of the features and the information redundancy in the features themselves. Therefore, we proposed a cross-view image localization method that combines the global spatial relation attention (GSRA) with feature aggregation. First, we utilize the lightweight GSRA to learn the spatial location structure information of features, which fully enhances the perceptual and discriminative capabilities of the model. The proposed attention has a little effect on the complexity and memory occupancy of the model and can be generalized to other image-processing tasks. In addition, we introduce the sinkhorn algorithm for locally aggregated descriptors (SALADs), which represents the aggregation of local features as an optimal transport problem and selectively discards useless information during the clustering and assignment of features, thus enhancing the generalization and robustness of the descriptors. Experimental results on the public University-1652, CVACT, and CVUSA datasets validate the effectiveness and superiority of the proposed method. Our code is available at: <uri>https://github.com/WenjianGan/LRFR</uri>.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10896706/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The cross-view image geo-localization (CVGL) refers to determining the geographic location of a given query image using an image database with the known location information. Existing methods mainly focus on learning discriminative image representations to optimize the distance of image feature representations in feature space without fully considering the positional relation information of the features and the information redundancy in the features themselves. Therefore, we proposed a cross-view image localization method that combines the global spatial relation attention (GSRA) with feature aggregation. First, we utilize the lightweight GSRA to learn the spatial location structure information of features, which fully enhances the perceptual and discriminative capabilities of the model. The proposed attention has a little effect on the complexity and memory occupancy of the model and can be generalized to other image-processing tasks. In addition, we introduce the sinkhorn algorithm for locally aggregated descriptors (SALADs), which represents the aggregation of local features as an optimal transport problem and selectively discards useless information during the clustering and assignment of features, thus enhancing the generalization and robustness of the descriptors. Experimental results on the public University-1652, CVACT, and CVUSA datasets validate the effectiveness and superiority of the proposed method. Our code is available at: https://github.com/WenjianGan/LRFR.