DINOv2-Based UAV Visual Self-Localization in Low-Altitude Urban Environments

IF 4.6 2区 计算机科学 Q2 ROBOTICS
Jiaqiang Yang;Danyang Qin;Huapeng Tang;Sili Tao;Haoze Bie;Lin Ma
{"title":"DINOv2-Based UAV Visual Self-Localization in Low-Altitude Urban Environments","authors":"Jiaqiang Yang;Danyang Qin;Huapeng Tang;Sili Tao;Haoze Bie;Lin Ma","doi":"10.1109/LRA.2025.3527762","DOIUrl":null,"url":null,"abstract":"Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global navigation satellite system (GNSS) signals are unavailable. This technology estimates the UAV's geographic location by performing cross-view matching between UAV and satellite images. However, significant viewpoint differences between UAV and satellite images result in poor accuracy for existing cross-view matching methods. To address this, we integrate the DINOv2 model with UAV visual localization tasks and propose a DINOv2-based UAV visual self-localization method. Considering the inherent differences between pre-trained models and cross-view matching tasks, we propose a global-local feature adaptive enhancement method (GLFA). This method leverages Transformer and multi-scale convolutions to capture long-range dependencies and local spatial information in visual images, improving the model's ability to recognize key discriminative landmarks. In addition, we propose a cross-enhancement method based on a spatial pyramid (CESP), which constructs a multi-scale spatial pyramid to cross-enhance features, effectively improving the ability of the features to perceive multi-scale spatial information. Experimental results demonstrate that the proposed method achieves impressive scores of 86.27% in R@1 and 88.87% in SDM@1 on the DenseUAV public benchmark dataset, providing a novel solution for UAV visual self-localization.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"2080-2087"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10835173/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global navigation satellite system (GNSS) signals are unavailable. This technology estimates the UAV's geographic location by performing cross-view matching between UAV and satellite images. However, significant viewpoint differences between UAV and satellite images result in poor accuracy for existing cross-view matching methods. To address this, we integrate the DINOv2 model with UAV visual localization tasks and propose a DINOv2-based UAV visual self-localization method. Considering the inherent differences between pre-trained models and cross-view matching tasks, we propose a global-local feature adaptive enhancement method (GLFA). This method leverages Transformer and multi-scale convolutions to capture long-range dependencies and local spatial information in visual images, improving the model's ability to recognize key discriminative landmarks. In addition, we propose a cross-enhancement method based on a spatial pyramid (CESP), which constructs a multi-scale spatial pyramid to cross-enhance features, effectively improving the ability of the features to perceive multi-scale spatial information. Experimental results demonstrate that the proposed method achieves impressive scores of 86.27% in R@1 and 88.87% in SDM@1 on the DenseUAV public benchmark dataset, providing a novel solution for UAV visual self-localization.
基于dinov2的低空城市环境无人机视觉自定位
视觉自定位技术是无人机在全球导航卫星系统(GNSS)信号不可用的环境中实现自主导航和任务执行的关键技术。该技术通过在无人机和卫星图像之间执行交叉视图匹配来估计无人机的地理位置。然而,无人机和卫星图像之间的显著视点差异导致现有的交叉视点匹配方法精度较低。为了解决这一问题,我们将DINOv2模型与无人机视觉定位任务相结合,提出了一种基于DINOv2的无人机视觉自定位方法。考虑到预训练模型与交叉视图匹配任务之间的内在差异,提出了一种全局局部特征自适应增强方法(GLFA)。该方法利用Transformer和多尺度卷积来捕获视觉图像中的远程依赖关系和局部空间信息,提高了模型识别关键判别标志的能力。此外,我们提出了一种基于空间金字塔的交叉增强方法(CESP),该方法构建了一个多尺度空间金字塔对特征进行交叉增强,有效提高了特征对多尺度空间信息的感知能力。实验结果表明,该方法在DenseUAV公共基准数据集上的得分分别为R@1和SDM@1,分别达到86.27%和88.87%,为无人机视觉自定位提供了一种新的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信