DINOv2-Based UAV Visual Self-Localization in Low-Altitude Urban Environments

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-01-09 DOI:10.1109/LRA.2025.3527762

Jiaqiang Yang;Danyang Qin;Huapeng Tang;Sili Tao;Haoze Bie;Lin Ma

{"title":"DINOv2-Based UAV Visual Self-Localization in Low-Altitude Urban Environments","authors":"Jiaqiang Yang;Danyang Qin;Huapeng Tang;Sili Tao;Haoze Bie;Lin Ma","doi":"10.1109/LRA.2025.3527762","DOIUrl":null,"url":null,"abstract":"Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global navigation satellite system (GNSS) signals are unavailable. This technology estimates the UAV's geographic location by performing cross-view matching between UAV and satellite images. However, significant viewpoint differences between UAV and satellite images result in poor accuracy for existing cross-view matching methods. To address this, we integrate the DINOv2 model with UAV visual localization tasks and propose a DINOv2-based UAV visual self-localization method. Considering the inherent differences between pre-trained models and cross-view matching tasks, we propose a global-local feature adaptive enhancement method (GLFA). This method leverages Transformer and multi-scale convolutions to capture long-range dependencies and local spatial information in visual images, improving the model's ability to recognize key discriminative landmarks. In addition, we propose a cross-enhancement method based on a spatial pyramid (CESP), which constructs a multi-scale spatial pyramid to cross-enhance features, effectively improving the ability of the features to perceive multi-scale spatial information. Experimental results demonstrate that the proposed method achieves impressive scores of 86.27% in R@1 and 88.87% in SDM@1 on the DenseUAV public benchmark dataset, providing a novel solution for UAV visual self-localization.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"2080-2087"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10835173/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global navigation satellite system (GNSS) signals are unavailable. This technology estimates the UAV's geographic location by performing cross-view matching between UAV and satellite images. However, significant viewpoint differences between UAV and satellite images result in poor accuracy for existing cross-view matching methods. To address this, we integrate the DINOv2 model with UAV visual localization tasks and propose a DINOv2-based UAV visual self-localization method. Considering the inherent differences between pre-trained models and cross-view matching tasks, we propose a global-local feature adaptive enhancement method (GLFA). This method leverages Transformer and multi-scale convolutions to capture long-range dependencies and local spatial information in visual images, improving the model's ability to recognize key discriminative landmarks. In addition, we propose a cross-enhancement method based on a spatial pyramid (CESP), which constructs a multi-scale spatial pyramid to cross-enhance features, effectively improving the ability of the features to perceive multi-scale spatial information. Experimental results demonstrate that the proposed method achieves impressive scores of 86.27% in R@1 and 88.87% in SDM@1 on the DenseUAV public benchmark dataset, providing a novel solution for UAV visual self-localization.

查看原文本刊更多论文

基于dinov2的低空城市环境无人机视觉自定位

视觉自定位技术是无人机在全球导航卫星系统（GNSS）信号不可用的环境中实现自主导航和任务执行的关键技术。该技术通过在无人机和卫星图像之间执行交叉视图匹配来估计无人机的地理位置。然而，无人机和卫星图像之间的显著视点差异导致现有的交叉视点匹配方法精度较低。为了解决这一问题，我们将DINOv2模型与无人机视觉定位任务相结合，提出了一种基于DINOv2的无人机视觉自定位方法。考虑到预训练模型与交叉视图匹配任务之间的内在差异，提出了一种全局局部特征自适应增强方法（GLFA）。该方法利用Transformer和多尺度卷积来捕获视觉图像中的远程依赖关系和局部空间信息，提高了模型识别关键判别标志的能力。此外，我们提出了一种基于空间金字塔的交叉增强方法（CESP），该方法构建了一个多尺度空间金字塔对特征进行交叉增强，有效提高了特征对多尺度空间信息的感知能力。实验结果表明，该方法在DenseUAV公共基准数据集上的得分分别为R@1和SDM@1，分别达到86.27%和88.87%，为无人机视觉自定位提供了一种新的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.