Yue Zhou , Litong Feng , Mengcheng Lan , Xue Yang , Qingyun Li , Yiping Ke , Xue Jiang , Wayne Zhang
{"title":"Multimodal mathematical reasoning embedded in aerial vehicle imagery: Benchmarking, analysis, and exploration","authors":"Yue Zhou , Litong Feng , Mengcheng Lan , Xue Yang , Qingyun Li , Yiping Ke , Xue Jiang , Wayne Zhang","doi":"10.1016/j.isprsjprs.2025.09.007","DOIUrl":"10.1016/j.isprsjprs.2025.09.007","url":null,"abstract":"<div><div>Mathematical reasoning is critical for tasks such as precise distance and area computations, trajectory estimations, and spatial analysis in unmanned aerial vehicle (UAV) based remote sensing, yet current vision-language models (VLMs) have not been adequately tested in this domain. To address this gap, we introduce <span>AVI-Math</span>, the first benchmark to rigorously evaluate multimodal mathematical reasoning in aerial vehicle imagery, moving beyond simple counting tasks to include domain-specific knowledge in areas such as geometry, logic, and algebra. The dataset comprises 3,773 high-quality vehicle-related questions captured from UAV views, covering 6 mathematical subjects and 20 topics. The data, collected at varying altitudes and from multiple UAV angles, reflects real-world UAV scenarios, ensuring the diversity and complexity of the constructed mathematical problems. In this paper, we benchmark 14 prominent VLMs through a comprehensive evaluation and demonstrate that, despite their success on previous multimodal benchmarks, these models struggle with the reasoning tasks in <span>AVI-Math</span>. Our detailed analysis highlights significant limitations in the mathematical reasoning capabilities of current VLMs and suggests avenues for future research. Furthermore, we explore the use of Chain-of-Thought prompting and fine-tuning techniques, which show promise in addressing the reasoning challenges in <span>AVI-Math</span>. Our findings not only expose the limitations of VLMs in mathematical reasoning but also offer valuable insights for advancing UAV-based trustworthy VLMs in real-world applications. The code, and datasets will be released at <span><span>https://github.com/VisionXLab/avi-math</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 289-303"},"PeriodicalIF":12.2,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Gao , Chaozhen Lan , Chunxia Zhou , Yongxian Zhang , Wenjun Huang , Yiqiao Wang , Longhao Wang
{"title":"Arctic sea ice motion retrieval from multisource SAR images using a keypoint-free feature tracking algorithm","authors":"Tian Gao , Chaozhen Lan , Chunxia Zhou , Yongxian Zhang , Wenjun Huang , Yiqiao Wang , Longhao Wang","doi":"10.1016/j.isprsjprs.2025.09.013","DOIUrl":"10.1016/j.isprsjprs.2025.09.013","url":null,"abstract":"<div><div>The rapid changes in Arctic sea ice serve as an important indicator for the global climate system. Remote sensing-based sea ice motion (SIM) monitoring has become a key technological tool in polar environment research. Traditional feature tracking algorithms, such as ORB and A-KAZE, have certain limitations in Synthetic Aperture Radar (SAR) image-based SIM retrieval. We propose a novel deep learning-based feature tracking framework. First, a phase consistency edge enhancement module is used to extract edge structure features from SAR images, improving the uniformity of feature distribution. Next, a geographic location constraint attention mechanism is designed, embedding the physical model of SIM into the Transformer architecture. By establishing a mapping relationship between geographic coordinates and feature space, sparse self-attention and region-focused cross-attention are guided. This significantly enhances the generalization ability of feature representation and improves matching efficiency. The experiments use multi-source SAR data from Sentinel-1, ALOS-2, C-SAR, Envisat ASAR, and RadarSat-2 to construct a test set. Experimental results show that, compared to traditional methods, the proposed algorithm not only greatly reduces computational time but also improves the accuracy of motion vector speed and direction by approximately 50% on multi-source SAR images. The deep learning feature tracking framework developed in this study provides new technical support and research perspectives for Arctic sea ice dynamics.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 258-274"},"PeriodicalIF":12.2,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145119996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuli Sun , Lin Lei , Zhang Li , Gangyao Kuang , Qifeng Yu
{"title":"Detecting changes without comparing images: Rules induced change detection in heterogeneous remote sensing images","authors":"Yuli Sun , Lin Lei , Zhang Li , Gangyao Kuang , Qifeng Yu","doi":"10.1016/j.isprsjprs.2025.09.009","DOIUrl":"10.1016/j.isprsjprs.2025.09.009","url":null,"abstract":"<div><div>Heterogeneous change detection (HCD) is crucial for monitoring surface changes using various remote sensing data, especially in disaster emergency response and environmental monitoring. To facilitate the comparability of heterogeneous images, previous methods are devoted to designing various complex transformation functions to transfer heterogeneous images into a common domain for comparison. As a result, the performance of HCD is constrained by the accuracy and robustness of these transformation functions. Unlike existing comparison-based HCD methods that rely on complex transformations and feature alignments between heterogeneous images, this paper proposes an unsupervised rules-induced energy model (RIEM) that detects changes by independently analyzing intra-image relationships, without explicitly comparing the heterogeneous images. This frees HCD from the complicated and challenging transformations and interactions between heterogeneous images. Specifically, we first establish the connections between the class relationships (same/different) and change labels (changed/unchanged) of pairwise superpixels, and then derive six rules for determining the change label of each superpixel, which enables detecting changes by considering only the intra-image relationships within each image, without inter-image comparisons. Then, we build an energy-based model to release the ability of rules to identify changes, which implements four types of energy loss functions. Remarkably, since the rules used in the energy model are derived based on the nature of change detection problem, the proposed RIEM is highly robust to imaging conditions. Extensive experiments on seven datasets demonstrate the efficacy of RIEM in detecting changes from heterogeneous images. The code is released at <span><span>https://github.com/yulisun/RIEM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 241-257"},"PeriodicalIF":12.2,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145119995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enjun Gong , Jing Zhang , Zhihui Wang , Qingfeng Hu , Hongying Bai , Jun Wang
{"title":"Global comparison of temperature and water stress representations in light use efficiency models for gross primary productivity estimation","authors":"Enjun Gong , Jing Zhang , Zhihui Wang , Qingfeng Hu , Hongying Bai , Jun Wang","doi":"10.1016/j.isprsjprs.2025.09.018","DOIUrl":"10.1016/j.isprsjprs.2025.09.018","url":null,"abstract":"<div><div>Accurate estimation of gross primary productivity (GPP) using light use efficiency (LUE) models based on remote sensing data remains a challenge, because LUE determines environmental constraints within the modeling framework by incorporating temperature and water stress functions. Moreover, LUE models represent environmental stresses inconsistently. We conducted a global-scale comparison of different combinations of temperature and water stress functions to systematically evaluate the impact on GPP estimates. Monthly observation data from 172 eddy covariance flux towers distributed around the world were combined with six stress functions drawn from three prominent LUE models (the Carnegie–Ames–Stanford Approach (CASA), Vegetation Photosynthesis Model (VPM), and Moderate Resolution Imaging Spectroradiometer) to develop nine candidate schemes, and the model performance was evaluated according to the coefficient of determination (R<sup>2</sup>) and root mean square error (RMSE). Globally, the best‑performing configuration coupled the water stress function from VPM with the temperature stress function from CASA (R<sup>2</sup> = 0.721; RMSE = 55.4 g C·m<sup>−2</sup>·month<sup>−1</sup>). However, the model performance markedly varied with the vegetation types: temperature stresses were the principal limiting factor in forests, croplands, and wetlands, whereas water stresses were the principal limiting factor in arid and temperate grasslands. A feature importance analysis using the XGBoost algorithm corroborated this pattern. The differences among stress functions mainly originated from their input parameters. A sensitivity analysis revealed that GPP is most responsive to changes in the optimum temperature compared with water or temperature extremes. These findings underscore the need to tailor stress parameterization to specific climate zones and vegetation types. They provide clear guidance for improving GPP estimates based on remote sensing products and lay a foundation for the next generation of carbon‑cycle models to consider the effects of climate change.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 275-288"},"PeriodicalIF":12.2,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145119998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Xiong , Tao Zhen , Wenyu Huang, Bingxu Min, Wenxian Yu
{"title":"Fractal-domain deep learning with Transformer architecture for SAR ship classification","authors":"Gang Xiong , Tao Zhen , Wenyu Huang, Bingxu Min, Wenxian Yu","doi":"10.1016/j.isprsjprs.2025.09.002","DOIUrl":"10.1016/j.isprsjprs.2025.09.002","url":null,"abstract":"<div><div>This paper extends deep learning from spatiotemporal-frequency to the fractal domain, and to the best of our knowledge, introduces for the first time the concept of fractal domain deep learning. Firstly, a Fractal Domain Transformer (FracFormer) model architecture is proposed to address the challenging problem of SAR image target classification in complex scenarios. Based on the Singularity Exponent-Domain Image Feature Transform (SIFT), FracFormer transforms original images into the fractal-domain feature images, utilizes fractal feature filters and combiners for iterative learning, and ultimately achieves image classification through fractal feature mixers and classifiers. Particularly, we derived the fractal feature filtering theorem based on SIFT and the feature combination theorem based on SIFT, providing theoretical support for the design of the core modules of FracFormer. On the OpenSARShip2.0 dataset, our model outperforms baseline models, with improvements ranging from 0.37 % to 11.83 % on average. Besides, extensive visualization analysis of the model’s fractal domain feature learning results indicates that FracFormer accords with the two theorems, representing good interpretability. Furthermore, FracFormer demonstrates fast convergence and strong generalization in low signal-to-noise ratio scenarios. Specifically, at 0 dB sea clutter, it achieves a 9.96 % improvement in classification performance over frequency domain GFNet and accelerates convergence by approximately 36 %. The findings of this study are expected to provide new learning paradigms and model architectures for the fields of deep learning and computer vision.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 208-226"},"PeriodicalIF":12.2,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junhuan Liu , San Jiang , Wei Ge , Wei Huang , Bingxuan Guo , Qingquan Li
{"title":"UAVPairs: A benchmark for match pair retrieval of large-scale UAV images","authors":"Junhuan Liu , San Jiang , Wei Ge , Wei Huang , Bingxuan Guo , Qingquan Li","doi":"10.1016/j.isprsjprs.2025.09.008","DOIUrl":"10.1016/j.isprsjprs.2025.09.008","url":null,"abstract":"<div><div>Match pair retrieval aims to identify spatially overlapping image pairs that can accelerate feature matching and guide SfM (Structure from Motion) based 3D reconstruction. The primary contribution of this paper is a challenging benchmark dataset, UAVPairs, and a training pipeline designed for match pair retrieval of large-scale UAV images. First, the UAVPairs dataset, comprising 21,622 high-resolution images across 30 diverse scenes, is constructed; the 3D points and tracks generated by SfM-based 3D reconstruction are employed to define the geometric similarity of image pairs, ensuring genuinely matchable image pairs are used for training. Second, to solve the problem of expensive mining cost for global hard negative mining, a batched nontrivial sample mining strategy is proposed, leveraging the geometric similarity and multi-scene structure of the UAVPairs to generate training samples as to accelerate training. Third, recognizing the limitation of pair-based losses, the ranked list loss is designed to improve the discrimination of image retrieval models, which optimizes the global similarity structure constructed from the positive set and negative set. Finally, the effectiveness of the UAVPairs dataset and training pipeline is validated through comprehensive experiments on three distinct large-scale UAV datasets. The experiment results demonstrate that models trained with the UAVPairs dataset and the ranked list loss achieve significantly improved retrieval accuracy compared to models trained on existing datasets or with conventional losses. Furthermore, these improvements translate to enhanced view graph connectivity and higher quality of reconstructed 3D models. The models trained by the proposed approach perform more robustly compared with hand-crafted global features, particularly in challenging repetitively textured scenes and weakly textured scenes. For match pair retrieval of large-scale UAV images, the trained image retrieval models offer an effective solution. The dataset would be made publicly available at <span><span>https://github.com/json87/UAVPairs</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 227-240"},"PeriodicalIF":12.2,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cremildo R.G. Dias , Alana K. Neves , João M.N. Silva , Natasha S. Ribeiro , José M.C. Pereira
{"title":"A landsat-based burned area atlas (2000–2023) for the Niassa Special Reserve, Mozambique using U-Net deep learning","authors":"Cremildo R.G. Dias , Alana K. Neves , João M.N. Silva , Natasha S. Ribeiro , José M.C. Pereira","doi":"10.1016/j.isprsjprs.2025.09.005","DOIUrl":"10.1016/j.isprsjprs.2025.09.005","url":null,"abstract":"<div><div>Savanna burning plays a key ecological role in <em>miombo</em> woodlands, influencing vegetation regeneration, biodiversity, and ecosystem structure. This study provides a comprehensive fire atlas and spatiotemporal assessment of fire activity from 2000 to 2023, in the Niassa Special Reserve (NSR), northern Mozambique, a key protected area is sub-Saharan Africa. Using medium-resolution satellite imagery and a Deep Learning classification approach (U-Net), we mapped annual burned areas and analysed spatial and temporal patterns of burning, including recurrence and seasonality. The results indicate a mean fire return interval of 2.8 years, with distinct differences between the Early Dry Season (EDS) and Late Dry Season (LDS): fire recurrence was as frequent as 1.9 years in the LDS, while EDS intervals extended up to 30 years. Fire activity was most intense in central and eastern lowlands, while higher elevations such as Mount Mecula showed lower fire occurrence. The classification model demonstrated strong performance, with Dice Coefficients ranging from 91.4 % to 94.6 %. The resulting atlas offers valuable insights for adaptive fire management, biodiversity conservation, and climate resilience in the NSR and similar savanna ecosystems.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 147-169"},"PeriodicalIF":12.2,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Nie , Bin Luo , Jun Liu , Zhitao Fu , Huan Zhou , Shuo Zhang , Weixing Liu
{"title":"PromptMID: Modal invariant descriptors based on diffusion and vision foundation models for optical-SAR image matching","authors":"Han Nie , Bin Luo , Jun Liu , Zhitao Fu , Huan Zhou , Shuo Zhang , Weixing Liu","doi":"10.1016/j.isprsjprs.2025.08.030","DOIUrl":"10.1016/j.isprsjprs.2025.08.030","url":null,"abstract":"<div><div>The ideal goal of generalizable image matching is to achieve stable and efficient performance in unseen domains. However, many existing learning-based optical-SAR image matching methods, despite demonstrating effectiveness in specific scenarios, often exhibit limited generalization and face challenges in adapting to practical applications. Repeatedly training or fine-tuning matching models to address domain differences not only lacks elegance but also incurs additional computational overhead and data production costs. In recent years, foundation models have shown significant potential for enhancing generalization. However, the disparity in visual domains between natural and remote sensing images poses challenges for their direct application. Consequently, effectively leveraging foundation models to improve the generalization of optical-SAR image matching remains a critical challenge. To address these challenges, we propose PromptMID, a novel approach that constructs modality invariant descriptors using text prompts based on land use classification as priors information for optical and SAR image matching. PromptMID consists of several key stages. Firstly, we fine-tune the diffusion model (DM) using we collected optical images, SAR images, and text prompts data to obtain the PromptDM model. Secondly, we construct modality-invariant descriptors by integrating multi-scale latent diffusion features extracted from the fine-tuned PromptDM model with multi-scale features derived from pre-trained visual foundation models (VFMs). To efficiently fuse local–global and texture-semantic features of varying granularities, we design a feature aggregation module (FAM) that ensures comprehensive feature representation. Finally, the discriminative power of the descriptors is enhanced through contrastive learning loss functions, aiming to improve the robustness and generalization of matching. Extensive experiments conducted on optical-SAR image datasets from five diverse regions demonstrate that PromptMID outperforms state-of-the-art matching methods, achieving superior performance in both seen and unseen domains while exhibiting strong cross-domain generalization capabilities. The source code will be made publicly available <span><span>https://github.com/HanNieWHU/PromptMID</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 192-207"},"PeriodicalIF":12.2,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shunguo Huang , Tengfei Long , Zhaoming Zhang , Guojin He , Guizhou Wang
{"title":"Generation of 30 m resolution monthly burned area product in Africa based on Landsat 8/9 and Sentinel-2 data","authors":"Shunguo Huang , Tengfei Long , Zhaoming Zhang , Guojin He , Guizhou Wang","doi":"10.1016/j.isprsjprs.2025.09.012","DOIUrl":"10.1016/j.isprsjprs.2025.09.012","url":null,"abstract":"<div><div>Accurate burned area (BA) detection is critical for understanding fire dynamics and assessing ecological impacts. However, the existing continental-scale BA products are mainly at low and medium spatial resolution, which is difficult to detect small or fragmented fires, resulting in significant underestimation of BA detection. In this study, we propose a novel high-resolution (30 m) monthly BA mapping approach by integrating Sentinel-2 and Landsat 8/9 images on the Google Earth Engine (GEE) platform, and generate the product of African Monthly Burned Area in 2019 (AMBA2019). The workflow initiates with a stratified random sampling scheme that intersects MCD12Q1 land-cover classifications with GFED5 fire-frequency zones, ensuring spatially representative training sample distributions across diverse ecosystems and fire regimes. A multi-dimensional feature stack for BA detection is constructed encompassing fire behavior indicators, vegetation dynamics, moisture stress metrics, and temporal-difference signatures, which includes the newly developed time-aware spectral indices. A two-stage Random Forest classification framework, trained on stratified sample points and multiple BA detection features, is subsequently applied to identify candidate burned scars. To further refine the preliminary outputs of the Random Forest model, threshold testing, spatial filtering, and the region-growing algorithm are applied to reduce false positives and improve detection of small fires typically missed by coarse-resolution BA products. Validation against the Burning Area Reference Database (BARD) shows that AMBA2019 achieves an overall accuracy of 96.38% and 94.69%, respectively, with the lowest commission and omission errors compared with three widely used BA products (MCD64A1, FireCCI51, and FireCCISFD20). This research offers a robust foundation for quantifying fire-induced carbon emissions and enhancing climate modeling capabilities in Africa.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 170-191"},"PeriodicalIF":12.2,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingjun Yi , Yanfei Zhong , Yu Su , Ruiyi Yang , Yinhe Liu , Junjue Wang
{"title":"Global urban high-resolution scene classification via uncertainty-aware domain generalization","authors":"Jingjun Yi , Yanfei Zhong , Yu Su , Ruiyi Yang , Yinhe Liu , Junjue Wang","doi":"10.1016/j.isprsjprs.2025.08.027","DOIUrl":"10.1016/j.isprsjprs.2025.08.027","url":null,"abstract":"<div><div>Global urban scene classification is a crucial technology for global land use mapping, holding significant importance in driving urban intelligence forward. When applying datasets constructed from urban scenes on a global scale, there are two serious problems. Due to cultural, economic, and other factors, style differences exist in scenes across different cities, posing challenges for model generalization. Additionally, urban scene samples often follows a long-tailed distribution, complicating the identification of tail categories with small sample volumes and impairing performance under domain generalization settings. To tackle these problems, the Uncertainty-aware Domain Generalization urban scene classification (UADG) framework is constructed. For mitigating city-related style difference among global cities, a city-related whitening is proposed, utilizing whitening operations to separate city unrelated content features and adaptively preserving city-related information hidden in style features, rather than directly removing style information, thus aiding in more robust representations. To tackle the phenomenon of significant accuracy decline in tail classes during domain generalization, estimated uncertainty is utilized to guide the mixture of experts, and reasonable expert assignment is conducted for hard samples to balance the model bias. To evaluate the proposed UADG framework under practical scenario, the Domain Generalized Urban Scene (DGUS) dataset is curated for validation, with a training set comprising 42 classes of samples from 34 provincial capitals in China, and test samples selected from representative cities across six continents. Extensive experiments have demonstrated that our method achieves state-of-the-art performance, notably outperforming the baseline GAMMA by 9.79% and 7.42% with average OA and AA metric on the unseen domains of DGUS, respectively. UADG greatly enhancing the automation of global urban land use mapping.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 92-108"},"PeriodicalIF":12.2,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}