ISPRS Journal of Photogrammetry and Remote Sensing最新文献_第4页

Domain generalization for semantic segmentation of remote sensing images via vision foundation model fine-tuning 基于视觉基础模型微调的遥感图像语义分割领域泛化

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-17 DOI: 10.1016/j.isprsjprs.2025.09.004

Muying Luo , Yujie Zan , Kourosh Khoshelham , Shunping Ji

{"title":"Domain generalization for semantic segmentation of remote sensing images via vision foundation model fine-tuning","authors":"Muying Luo , Yujie Zan , Kourosh Khoshelham , Shunping Ji","doi":"10.1016/j.isprsjprs.2025.09.004","DOIUrl":"10.1016/j.isprsjprs.2025.09.004","url":null,"abstract":"<div><div>Practice-oriented and general-purpose deep semantic segmentation models are required to be effective in various application scenarios without heavy re-training or with minimum fine-tuning. This calls for the domain generalization ability of models. Vision Foundation Models (VFMs), trained on massive and diverse datasets, have shown impressive generalization capabilities in computer vision tasks. However, how to utilize their generalization ability for remote sensing cross-domain semantic segmentation remains understudied. In this paper, we explore to identify the most suitable VFM for remote sensing images and further enhance its generalization ability in the context of remote sensing image segmentation. Our study begins with a comprehensive generalization ability evaluation of various VFMs and classic CNN or transformer backbone networks under different settings. We discover that the DINO v2 ViT-L outperforms other backbones with frozen parameters or full fine-tuning. Building upon DINO v2, we propose a novel domain generalization framework from both data and deep feature perspectives. This framework incorporates two key modules, the Geospatial Semantic Adapter (GeoSA), and the Batch Style Augmenter (BaSA), which together unlock the potential of DINO v2 in remote sensing image semantic segmentation. GeoSA consists of three core components: enhancer, bridge and extractor. These components work synergistically to extract robust features from the pre-trained DINO v2 and generate multi-scale features adapted to remote sensing images. BaSA employs batch-level data augmentation to reduce reliance on dataset-specific features and promote domain-invariant learning. Extensive experiments across four remote sensing datasets and four domain generalization scenarios for both binary and multi-class semantic segmentation consistently demonstrate our method’s superior cross-domain generalization ability and robustness, surpassing advanced domain generalization methods and other VFM fine-tuning methods. Code will be released at <span><span>https://github.com/mmmll23/GeoSA-BaSA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 126-146"},"PeriodicalIF":12.2,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images GSTM-SCD：面向多时相遥感图像语义变化检测的图增强时空状态空间模型

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-16 DOI: 10.1016/j.isprsjprs.2025.09.003

Xuanguang Liu , Chenguang Dai , Lei Ding , Zhenchao Zhang , Yujie Li , Xibing Zuo , Mengmeng Li , Hanyun Wang , Yuzhe Miao

{"title":"GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images","authors":"Xuanguang Liu , Chenguang Dai , Lei Ding , Zhenchao Zhang , Yujie Li , Xibing Zuo , Mengmeng Li , Hanyun Wang , Yuzhe Miao","doi":"10.1016/j.isprsjprs.2025.09.003","DOIUrl":"10.1016/j.isprsjprs.2025.09.003","url":null,"abstract":"<div><div>Multi-temporal Semantic change detection (MT-SCD) provides crucial information for a wide variety of applications, including land use monitoring, urban planning, and sustainable development. However, previous deep learning-based SCD approaches exhibit limitations in time-series semantic change analysis, particularly in understanding Earth surface change dynamics. Specifically, literature methods typically employ Siamese networks to exploit the multi-temporal information. This hinders temporal interactions, failing to comprehensively model spatio-temporal dependencies, causing substantial classification and detection errors in complex scenes. Another key issue is the neglect of temporal transitivity consistency, resulting in predictions that contradict the multi-temporal change chain rules inherent to MT-SCD. Furthermore, literature approaches do not consider dynamic adaptation to the number of observation dates, failing to process time-series remote sensing images (RSIs) with arbitrary time steps. To address these challenges, we propose a graph-enhanced spatio-temporal Mamba (GSTM-SCD) for MT-SCD (including both bi-temporal SCD and time-series SCD). It employs vision state space models to capture the spatio-temporal dependencies in multi-temporal RSIs, and leverages graph modeling to enhance inter-temporal dependencies. First, we employ a single-branch Mamba encoder to efficiently exploit multi-temporal semantics and construct a spatio-temporal graph optimization mechanism to facilitate interactions between multi-temporal RSIs, while maintaining spatial continuity of feature representations. Second, we introduce a bidirectional three-dimensional change scanning strategy to learn underlying semantic change patterns. Finally, a novel loss function tailored for time-series SCD is proposed, which regularizes the multi-temporal topological relationships within data. The resulting approach, GSTM-SCD, demonstrates significant accuracy improvements compared to the state-of-the-art (SOTA) methods. Experiments conducted on four open benchmark datasets (SECOND, Landsat-SCD, WUSU and DynamicEarthNet) demonstrate that our method surpasses the SOTA by 0.53%, 1.66%, 9.32% and 0.78% in SeK, respectively. Moreover, it significantly reduces computational costs in comparison with recent SOTA methods. The associated codes is made available at: <span><span><em>https://github.com/liuxuanguang/GSTM-SCD</em></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 73-91"},"PeriodicalIF":12.2,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145097752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Text-Guided Coarse-to-Fine Fusion Network for robust remote sensing visual question answering 文本引导的粗精融合网络鲁棒遥感视觉问答

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-12 DOI: 10.1016/j.isprsjprs.2025.08.029

Zhicheng Zhao , Changfu Zhou , Yu Zhang , Chenglong Li , Xiaoliang Ma , Jin Tang

{"title":"Text-Guided Coarse-to-Fine Fusion Network for robust remote sensing visual question answering","authors":"Zhicheng Zhao , Changfu Zhou , Yu Zhang , Chenglong Li , Xiaoliang Ma , Jin Tang","doi":"10.1016/j.isprsjprs.2025.08.029","DOIUrl":"10.1016/j.isprsjprs.2025.08.029","url":null,"abstract":"<div><div>Remote Sensing Visual Question Answering (RSVQA) has gained significant research interest. However, current RSVQA methods are limited by the imaging mechanisms of optical sensors, particularly under challenging conditions such as cloud-covered and low-light scenarios. Given the all-time and all-weather imaging capabilities of Synthetic Aperture Radar (SAR), it is crucial to investigate the integration of optical-SAR images to improve RSVQA performance. In this work, we propose a Text-Guided Coarse-to-Fine Fusion Network (TGFNet), which leverages the semantic relationships between question text and multi-source images to guide the network toward complementary fusion at the feature level. Specifically, we develop a Text-Guided Coarse-to-Fine Attention Refinement (CFAR) module to focus on key areas related to the question in complex remote sensing images. This module progressively directs attention from broad areas to finer details through key region routing, enhancing the model’s ability to focus on relevant regions. Furthermore, we propose an Adaptive Multi-Expert Fusion (AMEF) module that dynamically integrates different experts, enabling the adaptive fusion of optical and SAR features. In addition, we create the first large-scale benchmark dataset for evaluating optical-SAR RSVQA methods, comprising 7,108 well-aligned optical-SAR image pairs and 1,131,730 well-labeled question–answer pairs across 16 diverse question types, including complex relational reasoning questions. Extensive experiments on the proposed dataset demonstrate that our TGFNet effectively integrates complementary information from optical and SAR images, significantly improving the model’s performance in challenging scenarios. The dataset is available at: <span><span>https://github.com/mmic-lcl/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 1-17"},"PeriodicalIF":12.2,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BUD: Band-limited uncalibrated detector of environmental changes for InSAR monitoring framework 用于InSAR监测框架的带限无校准环境变化探测器

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-12 DOI: 10.1016/j.isprsjprs.2025.08.032

Giovanni Costa , Andrea Virgilio Monti Guarnieri , Marco Manzoni , Alessandro Parizzi

{"title":"BUD: Band-limited uncalibrated detector of environmental changes for InSAR monitoring framework","authors":"Giovanni Costa , Andrea Virgilio Monti Guarnieri , Marco Manzoni , Alessandro Parizzi","doi":"10.1016/j.isprsjprs.2025.08.032","DOIUrl":"10.1016/j.isprsjprs.2025.08.032","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) is used in a wide variety of fields, such as monitoring failures and measuring infrastructure health. Detecting spatio-temporal changes in the observed scene is of paramount importance, particularly considering the prevention of hazards. In this paper, we propose a novel nonparametric method called Band-limited Uncalibrated Detector (BUD) for change detection using InSAR coherence. BUD is a flexible, robust, and responsive tool designed for monitoring applications. It directly inspects observed data, making inferences without relying on strong theoretical assumptions or requiring calibration with known stable targets. It achieves this by applying a nonparametric statistical hypothesis test to multi-temporal InSAR coherence samples, specifically looking for differences in their statistical distributions. After outlining the theoretical principles of our proposed algorithm, we present a synthetic performance analysis comparing BUD with various state-of-the-art methods. Then, BUD is applied to two challenging real-world scenarios crucial for monitoring applications: an open-pit mining site, known for frequent and composite environmental changes, and an urban area, which typically experiences infrequent changes demanding highly responsive change detection methods. In both cases, we provide a comparison with other leading methods. Finally, we cross-validate BUD in the open-pit mine scenario by intersecting analysis results from three different InSAR datasets covering the same area of interest, featuring diverse acquisition geometries and operational bandwidths (X-Band and C-Band), proposing a novel way to interpret InSAR data. The algorithm’s final validation is achieved using available ground truth data in the urban scenario.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 55-72"},"PeriodicalIF":12.2,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Contextual boundary-aware network for semantic segmentation of complex land transportation point cloud scenes 复杂陆运点云场景语义分割的上下文边界感知网络

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-12 DOI: 10.1016/j.isprsjprs.2025.09.006

Yanming Chen , Jiakang Xia , Xincan Zou , Ziting Xiao , Xin Tang , Yufu Zang , Dong Chen , Yueqian Shen

{"title":"Contextual boundary-aware network for semantic segmentation of complex land transportation point cloud scenes","authors":"Yanming Chen , Jiakang Xia , Xincan Zou , Ziting Xiao , Xin Tang , Yufu Zang , Dong Chen , Yueqian Shen","doi":"10.1016/j.isprsjprs.2025.09.006","DOIUrl":"10.1016/j.isprsjprs.2025.09.006","url":null,"abstract":"<div><div>Semantic segmentation of land transportation scenes is critical for infrastructure maintenance and the advancement of intelligent transportation systems. Unlike traditional large-scale scenes, land transportation environments present intricate structural dependencies among infrastructure elements and pronounced class imbalance. To address these challenges, we propose a Gaussian-enhanced positional encoding block that leverages the Gaussian function’s intrinsic smoothing and reweighting properties to project relative positional information into a higher-dimensional space. By fusing this enhanced representation with the original positional encoding, the model gains a more nuanced understanding of spatial dependencies among infrastructures, thereby improving its capacity for semantic segmentation in complex land transportation scenes. Furthermore, we introduce the Multi-Context Interaction Module (MCIM) into the backbone network, varying the number of MCIMs across different network levels to strengthen inter-layer context interactions and mitigate error accumulation. To mitigate class imbalance and excessive object adhesion within the scene, we incorporate a boundary-aware class-balanced (BCB) hybrid loss function. Comprehensive experiments on three distinct land transportation datasets validate the effectiveness of our approach, with comparative analyses against state-of-the-art methods demonstrating its consistent superiority. Specifically, our method attains the highest mIoU (91.8%) and OA (96.7%) on the high-speed rail dataset ExpressRail, the highest mIoU (73.3%) on the traditional railway dataset SNCF, and the highest mF1-score (87.4%) on the urban road dataset Pairs3D. Codes are uploaded at: <span><span>https://github.com/Kange7/CoBa</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 18-31"},"PeriodicalIF":12.2,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporal downscaling meteorological variables to unseen moments: Continuous temporal downscaling via Multi-source Spatial–temporal-wavelet feature Fusion and Time-Continuous Manifold 气象变量时间降尺度到看不见的时刻：基于多源时空小波特征融合和时间连续流形的连续时间降尺度

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-12 DOI: 10.1016/j.isprsjprs.2025.09.001

Sheng Gao , Lianlei Lin , Zongwei Zhang , Jiawei Wang

{"title":"Temporal downscaling meteorological variables to unseen moments: Continuous temporal downscaling via Multi-source Spatial–temporal-wavelet feature Fusion and Time-Continuous Manifold","authors":"Sheng Gao , Lianlei Lin , Zongwei Zhang , Jiawei Wang","doi":"10.1016/j.isprsjprs.2025.09.001","DOIUrl":"10.1016/j.isprsjprs.2025.09.001","url":null,"abstract":"<div><div>Accurate modeling of meteorological variables with high temporal resolution is crucial for simulations and decision-making in aviation, aerospace, and other engineering sectors. Conventional meteorological products typically have temporal resolutions exceeding one hour, hindering the characterization of short-term nonlinear evolutions in meteorological variables. Current temporal downscaling methods encounter challenges of insufficient multi-source data fusion, limited extrapolation capabilities of data distributions, and inadequate learning of spatiotemporal dependencies, leading to low modeling accuracy and difficulties in modeling meteorological environments with higher temporal resolutions than those in the training data. To address these issues, this study proposes MSF-TCMA (Multi-source Spatial–temporal-wavelet feature Fusion and Time-Continuous Manifold-based Algorithm) for continuous temporal downscaling. The algorithm introduces multiscale deep-wavelet feature extraction branch for integrating spatial dependence and the cross-modal spatiotemporal information fusing branch for fusing multi-source information and learning temporal dependence. The time-continuous manifold sampling branch is used to solve the problem of data distribution extrapolation. Finally, the algorithm’s continuous downscaling performance is optimized by employing multi-moment weighted meteorological state estimation-energy change deduction joint loss. Two regional case studies demonstrate that MSF-TCMA achieves modeling errors of less than 0.65 K for 2-meter temperature, less than 36.24 Pa for surface pressure, and less than 0.38 m/s for wind speed over a 6-hour time interval, with errors reduced by 3.99-99.64% compared to the comparison methods. Furthermore, two engineering experiments demonstrate that the method realizes continuous downscaling of multiple moments in a time interval (including for unseen moments during the algorithm training phase), and downscaling prediction of future meteorological states based on GFS forecast data. This study provides a new paradigm for high-precision and high-temporal resolution reconstruction of meteorological data, which is of great application value for optimization and risk control of complex engineering activities. The code is available at: <span><span>https://github.com/shermo1415/MSF-TCMA/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 32-54"},"PeriodicalIF":12.2,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145050683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A systematic survey and meta-analysis of the segment anything model in remote sensing image processing: Challenges, advances, applications, and opportunities 遥感影像处理中任何部分模型的系统调查与元分析：挑战、进展、应用与机遇

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-10 DOI: 10.1016/j.isprsjprs.2025.08.023

Zhipeng Wan , Sheng Wang , Wei Han , Yuewei Wang , Xiaohui Huang , Xiaohan Zhang , Xiaodao Chen , Yunliang Chen

{"title":"A systematic survey and meta-analysis of the segment anything model in remote sensing image processing: Challenges, advances, applications, and opportunities","authors":"Zhipeng Wan , Sheng Wang , Wei Han , Yuewei Wang , Xiaohui Huang , Xiaohan Zhang , Xiaodao Chen , Yunliang Chen","doi":"10.1016/j.isprsjprs.2025.08.023","DOIUrl":"10.1016/j.isprsjprs.2025.08.023","url":null,"abstract":"<div><div>In recent years, artificial intelligence (AI) technology has profoundly revolutionized the domain of remote sensing (RS), bringing transformative changes from data collection to analysis. Traditional remote sensing image interpretation (RSII) relies on manual interpretation and task-specific models, which suffer from low efficiency, high costs, and poor generalization, making them inadequate for large-scale data processing and complex tasks. With the emergence of foundational models (FMs) (i.e., large pre-trained AI models), not only has efficiency and accuracy been significantly improved, but diverse tasks can also be executed efficiently. Notably, the segment anything model (SAM) has challenged traditional visual paradigms, sparking widespread interest in task-agnostic visual FMs. Its exceptional zero-shot generalization capability has demonstrated outstanding performance in natural scenes, offering new perspectives and methodologies for the automation and intelligence of RSII. However, there are significant differences in spatial characteristics and data structures between RS images and natural images, meaning the application potential of SAM in RSII has yet to be comprehensively evaluated. Although existing studies have demonstrated SAM’s adaptability in RSII, the current literature lacks systematic and in-depth reviews. To fill this gap, this study conducts a comprehensive review and meta-analysis for the first time, focusing on the challenges, advances, applications, and potential of SAM in RSII. The paper first reviews SAM’s advances in RS and compiles relevant research findings. It then analyzes the inherent challenges of RS and explores the bottlenecks of SAM in RS, including semantic information loss, discrepancies between training and target domains, prompt dependency and design complexity, and insufficient robustness. Next, it outlines the details of the meta-analysis conducted to reveal the research status of SAM in RS. Following that, the paper delves into the adaptation methods of SAM in RS image processing and evaluates its performance in both general and specific RS tasks. Finally, future research directions are summarized. Additionally, to support the continued development of this field, a dedicated repository has been created and maintained (<span><span>https://github.com/WanZhan-lucky/WanSAM4RS-Tracker</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"229 ","pages":"Pages 436-466"},"PeriodicalIF":12.2,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145026316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts 具有不同域偏移的地理空间点云语义分割的测试时间自适应

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-09 DOI: 10.1016/j.isprsjprs.2025.08.022

Puzuo Wang , Wei Yao , Jie Shao , Zhiyi He

{"title":"Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts","authors":"Puzuo Wang , Wei Yao , Jie Shao , Zhiyi He","doi":"10.1016/j.isprsjprs.2025.08.022","DOIUrl":"10.1016/j.isprsjprs.2025.08.022","url":null,"abstract":"<div><div>Domain adaptation (DA) techniques aim to close the gap between source and target domains, enabling deep learning models to generalize across different data shift paradigms for point cloud semantic segmentation (PCSS). Among emerging DA schemes, test-time adaptation (TTA) facilitates direct adaptation of a pre-trained model to unlabeled data during the inference stage without access to source domain data and need for additional training process, which mitigates data privacy concerns and removes the requirement for substantial computational power. To fill the gap of leveraging TTA for geospatial PCSS, we introduce three typical domain shift paradigms in handling geospatial point clouds and construct three practical adaptation benchmarks, including photogrammetric point clouds to airborne LiDAR, airborne LiDAR to mobile LiDAR, and synthetic to mobile LiDAR. Then, a TTA method is proposed by exploiting the domain-specific knowledge embedded within the batch normalization (BN) layers. Given the pre-trained model, BN statistical information is progressively updated by fusing the statistics of each testing batch. Furthermore, we develop a self-supervised module to optimize the learnable BN affine parameters. Information maximization is used to generate confident and category-specific predictions, and reliability constrained pseudo-labeling is further incorporated to create supervisory signals. Extensive experimental analysis demonstrates that our proposed method significantly improves classification accuracy compared to directly applying the inference by up to 20% in terms of mIoU, which not only outperforms other popular counterparts but also maintains a high efficiency while avoiding retraining. In an adaptation of photogrammetric (SensatUrban) to airborne (Hessigheim 3D), our method achieves a mIoU of 59.46% and an OA of 85. 97%.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"229 ","pages":"Pages 422-435"},"PeriodicalIF":12.2,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145019452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ARTEMIS: A real-time efficient ortho-mapping and thematic identification system for UAV-based rapid response ARTEMIS：用于无人机快速响应的实时高效正交映射和主题识别系统

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-09 DOI: 10.1016/j.isprsjprs.2025.08.026

Yijun Liu , Akram Akbar , Ting Yu , Yunlong Yu , Yuanhang Kong , Jingwen Gao , Honghao Wang , Yanyi Li , Hongduo Zhao , Chun Liu

{"title":"ARTEMIS: A real-time efficient ortho-mapping and thematic identification system for UAV-based rapid response","authors":"Yijun Liu , Akram Akbar , Ting Yu , Yunlong Yu , Yuanhang Kong , Jingwen Gao , Honghao Wang , Yanyi Li , Hongduo Zhao , Chun Liu","doi":"10.1016/j.isprsjprs.2025.08.026","DOIUrl":"10.1016/j.isprsjprs.2025.08.026","url":null,"abstract":"<div><div>Rapid response to natural and human-made disasters requires both real-time mapping and identification of key targets-of-interest (TOIs)—capabilities missing in conventional Structure-from-Motion (SfM)-based unmanned aerial vehicle (UAV) mapping frameworks. While Simultaneous Localization and Mapping (SLAM)-based mapping systems offer real-time capability, they heavily depend on GPUs and reliable GNSS to process the challenging UAV imagery with high-resolution (<span><math><mo>></mo></math></span> 10 megapixels) and low-overlap (60%–90%). However, these prerequisites are often unavailable in resource-constrained post-disaster deployments. To address these limitations, we introduce ARTEMIS, a CPU-centric, real-time ortho-mapping system with direct map interpretation capability. Key innovations include: (1) A projection-error-guided window search strategy, derived from generalized stereo geometry, that enables robust and efficient feature matching using lightweight descriptors (e.g., ORB) on challenging aerial data. (2) A novel, lightweight matching confidence metric that enables adaptive weighting within Bundle Adjustment (BA), prioritizing high-quality matches to enhance accuracy without tight GNSS reliance. (3) An end-to-end workflow that outputs thematic analysis automatically, using integrated state-of-the-art deep learning models (supervised and zero-shot) to identify key TOIs within the resulting Digital Orthophoto Maps (DOMs). To the best of our knowledge, this is the first study to develop and validate such an end-to-end system on real-world disaster datasets collected by first responders, covering <em>geophysical</em> (e.g., earthquakes), <em>hydrological</em> (e.g., debris flows), <em>climatological</em> (e.g., wildfires), and <em>meteorological</em> (e.g., hurricanes) events. Extensive experiments show that ARTEMIS performs up to 58× faster than SfM methods (e.g., COLMAP) in sparse reconstruction and 22× faster than commercial solutions (e.g., ContextCapture) in DOM generation, while maintaining <span><math><mo><</mo></math></span> 0.5 m absolute positioning error. In mission-critical tasks like damage assessment, its thematic analysis achieves results (e.g., F1-scores and mIoU) directly comparable to those from offline, post-processed baselines. By bridging the gap between raw data collection and trustworthy intelligence, ARTEMIS demonstrates significant potential to empower immediate, informed decision-making in UAV-assisted emergency response.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"229 ","pages":"Pages 396-421"},"PeriodicalIF":12.2,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145019451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sea Level Anomaly prediction with TSTA-enhanced UNet 利用tsta增强UNet预测海平面异常

IF 12.2 1区地球科学

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-08 DOI: 10.1016/j.isprsjprs.2025.08.005

Qinxuan Wang , Jun Bai , Yineng Li , Shiming Xiang , Xiaoqing Chu , Yue Sun , Tielin Zhang

{"title":"Sea Level Anomaly prediction with TSTA-enhanced UNet","authors":"Qinxuan Wang , Jun Bai , Yineng Li , Shiming Xiang , Xiaoqing Chu , Yue Sun , Tielin Zhang","doi":"10.1016/j.isprsjprs.2025.08.005","DOIUrl":"10.1016/j.isprsjprs.2025.08.005","url":null,"abstract":"<div><div>The prediction of Sea Level Anomaly (SLA) is crucial for many applications in marine and meteorological tasks. Most recently developed SLA prediction methods have been developed mainly on the framework of the Recurrent Neural Network (RNN) and its variants. These frameworks suffer from insufficient capability to capture spatial information and low computational efficiency. To address these issues, this paper proposes a novel method called UNet and Temporal-Spatial Transformer Attention (UNet-TSTA) for accurate and efficient SLA prediction. In our model, UNet serves as the backbone structure of the prediction model, enhancing the model’s ability to capture features of sea surface eddies at different scales. Meanwhile, the TSTA module innovatively constructs multiple spatial–temporal planes through the free combination of temporal and spatial dimensions, utilizing the attention mechanism of the Point-by-Point Vision Transformer (P-ViT). The effective cooperation of P-ViT and CNN also enhances the training and inference speed of the model. Experimental results on real SLA datasets show that our UNet-TSTA method achieves millimeter-level average precision in predicting SLA fields for the next seven days. Compared to other advanced algorithms, our method shows significant improvements in both computational efficiency and prediction precision.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"229 ","pages":"Pages 382-395"},"PeriodicalIF":12.2,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145019450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0