Total solution for simultaneous pose and correspondence estimation of drone images in urban environments

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-06-24 DOI:10.1016/j.isprsjprs.2025.06.027

Shuang Li, Jie Shan

{"title":"Total solution for simultaneous pose and correspondence estimation of drone images in urban environments","authors":"Shuang Li, Jie Shan","doi":"10.1016/j.isprsjprs.2025.06.027","DOIUrl":null,"url":null,"abstract":"<div><div>Vision-based pose estimation for drone images in urban environments is particularly challenging when reliable GNSS and IMU signals are unavailable and the search space spans large areas. Traditional methods depend on known correspondences of well-defined landmark objects, which are not always feasible in complex urban environments. To address this problem, we propose a total solution that simultaneously estimates the image pose and its correspondences to a semantic map database. A cascaded network, named dual-head SegFormer, is developed to generate multi-class semantic segmentation maps and high-quality road centerlines from images. A city-wide coarse-to-fine image localization strategy aligns the image segmentation map with the database map using class-label consistency and graph representation indices, yielding initial poses for further optimization. The final pose is determined by minimizing a novel objective function that evaluates the differences between the image and database across three key aspects: semantic maps, road attributes, and tie point reprojection errors. Evaluated on three urban drone image datasets, our method achieves position and rotation errors below 2.03 m and 2 <span><math><mrow><msup><mrow><mspace></mspace></mrow><mo>°</mo></msup></mrow></math></span> relative to the bundle adjustment results. By incorporating semantic features and an improved objective function, our method achieves notable enhancements in robustness and accuracy compared to prior approach that relied exclusively on road attributes. This work provides a dependable alternative for vision-based navigation, further reducing dependence on GNSS data or precise initial pose information.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"227 ","pages":"Pages 349-365"},"PeriodicalIF":10.6000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625002539","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Vision-based pose estimation for drone images in urban environments is particularly challenging when reliable GNSS and IMU signals are unavailable and the search space spans large areas. Traditional methods depend on known correspondences of well-defined landmark objects, which are not always feasible in complex urban environments. To address this problem, we propose a total solution that simultaneously estimates the image pose and its correspondences to a semantic map database. A cascaded network, named dual-head SegFormer, is developed to generate multi-class semantic segmentation maps and high-quality road centerlines from images. A city-wide coarse-to-fine image localization strategy aligns the image segmentation map with the database map using class-label consistency and graph representation indices, yielding initial poses for further optimization. The final pose is determined by minimizing a novel objective function that evaluates the differences between the image and database across three key aspects: semantic maps, road attributes, and tie point reprojection errors. Evaluated on three urban drone image datasets, our method achieves position and rotation errors below 2.03 m and 2

^{°}

relative to the bundle adjustment results. By incorporating semantic features and an improved objective function, our method achieves notable enhancements in robustness and accuracy compared to prior approach that relied exclusively on road attributes. This work provides a dependable alternative for vision-based navigation, further reducing dependence on GNSS data or precise initial pose information.

查看原文本刊更多论文

城市环境下无人机图像同步姿态和对应估计的总体解决方案

在城市环境中，当可靠的GNSS和IMU信号不可用时，且搜索空间覆盖大面积时，基于视觉的无人机图像姿态估计尤其具有挑战性。传统的方法依赖于明确定义的地标物体的已知对应关系，这在复杂的城市环境中并不总是可行的。为了解决这个问题，我们提出了一个同时估计图像姿态及其与语义地图数据库对应关系的整体解决方案。开发了一种名为双头SegFormer的级联网络，用于从图像中生成多类语义分割地图和高质量道路中心线。全市范围内的粗到细图像定位策略使用类标签一致性和图形表示索引将图像分割地图与数据库地图对齐，从而产生用于进一步优化的初始姿态。最终姿态是通过最小化一个新的目标函数来确定的，该函数评估图像和数据库之间在三个关键方面的差异：语义地图、道路属性和连接点重投影误差。在三个城市无人机图像数据集上进行评估，我们的方法相对于束平差结果的位置和旋转误差分别在2.03 m和2°以下。通过结合语义特征和改进的目标函数，我们的方法在鲁棒性和准确性方面取得了显著的提高，而不是仅仅依赖于道路属性。这项工作为基于视觉的导航提供了可靠的替代方案，进一步减少了对GNSS数据或精确初始姿态信息的依赖。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.