{"title":"Total solution for simultaneous pose and correspondence estimation of drone images in urban environments","authors":"Shuang Li, Jie Shan","doi":"10.1016/j.isprsjprs.2025.06.027","DOIUrl":null,"url":null,"abstract":"<div><div>Vision-based pose estimation for drone images in urban environments is particularly challenging when reliable GNSS and IMU signals are unavailable and the search space spans large areas. Traditional methods depend on known correspondences of well-defined landmark objects, which are not always feasible in complex urban environments. To address this problem, we propose a total solution that simultaneously estimates the image pose and its correspondences to a semantic map database. A cascaded network, named dual-head SegFormer, is developed to generate multi-class semantic segmentation maps and high-quality road centerlines from images. A city-wide coarse-to-fine image localization strategy aligns the image segmentation map with the database map using class-label consistency and graph representation indices, yielding initial poses for further optimization. The final pose is determined by minimizing a novel objective function that evaluates the differences between the image and database across three key aspects: semantic maps, road attributes, and tie point reprojection errors. Evaluated on three urban drone image datasets, our method achieves position and rotation errors below 2.03 m and 2 <span><math><mrow><msup><mrow><mspace></mspace></mrow><mo>°</mo></msup></mrow></math></span> relative to the bundle adjustment results. By incorporating semantic features and an improved objective function, our method achieves notable enhancements in robustness and accuracy compared to prior approach that relied exclusively on road attributes. This work provides a dependable alternative for vision-based navigation, further reducing dependence on GNSS data or precise initial pose information.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"227 ","pages":"Pages 349-365"},"PeriodicalIF":10.6000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625002539","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Vision-based pose estimation for drone images in urban environments is particularly challenging when reliable GNSS and IMU signals are unavailable and the search space spans large areas. Traditional methods depend on known correspondences of well-defined landmark objects, which are not always feasible in complex urban environments. To address this problem, we propose a total solution that simultaneously estimates the image pose and its correspondences to a semantic map database. A cascaded network, named dual-head SegFormer, is developed to generate multi-class semantic segmentation maps and high-quality road centerlines from images. A city-wide coarse-to-fine image localization strategy aligns the image segmentation map with the database map using class-label consistency and graph representation indices, yielding initial poses for further optimization. The final pose is determined by minimizing a novel objective function that evaluates the differences between the image and database across three key aspects: semantic maps, road attributes, and tie point reprojection errors. Evaluated on three urban drone image datasets, our method achieves position and rotation errors below 2.03 m and 2 relative to the bundle adjustment results. By incorporating semantic features and an improved objective function, our method achieves notable enhancements in robustness and accuracy compared to prior approach that relied exclusively on road attributes. This work provides a dependable alternative for vision-based navigation, further reducing dependence on GNSS data or precise initial pose information.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.