ISPRS Journal of Photogrammetry and Remote Sensing最新文献

筛选
英文 中文
Impact of forest disturbance derived from Sentinel-2 time series on Landsat 8/9 land surface temperature: The case of Norway spruce in Central Germany Sentinel-2时间序列森林扰动对Landsat 8/9地表温度的影响:以德国中部挪威云杉为例
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-25 DOI: 10.1016/j.isprsjprs.2025.07.006
Simon Grieger , Martin Kappas , Susanne Karel , Philipp Koal , Tatjana Koukal , Markus Löw , Martin Zwanzig , Birgitta Putzenlechner
{"title":"Impact of forest disturbance derived from Sentinel-2 time series on Landsat 8/9 land surface temperature: The case of Norway spruce in Central Germany","authors":"Simon Grieger ,&nbsp;Martin Kappas ,&nbsp;Susanne Karel ,&nbsp;Philipp Koal ,&nbsp;Tatjana Koukal ,&nbsp;Markus Löw ,&nbsp;Martin Zwanzig ,&nbsp;Birgitta Putzenlechner","doi":"10.1016/j.isprsjprs.2025.07.006","DOIUrl":"10.1016/j.isprsjprs.2025.07.006","url":null,"abstract":"<div><div>Forest cover and vitality loss is a global phenomenon. Areas of Norway spruce (<em>Picea abies</em> (L.) Karst.) in Central Germany were affected by widespread vitality and canopy cover loss in the years from 2018 due to drought stress and pest infestation. Such disturbances can favor higher land surface temperature (LST) on cloudless summer days. Regional assessment of LST in disturbed forest stands is challenging due to the spatial and temporal resolution of available products and various influences on the surface energy budget. To assess the effects of forest disturbance and topographic and pedological site factors on LST, a time series of the Landsat 8/9 Surface Temperature product was combined with a Sentinel-2-based forest disturbance monitoring framework. Results from three regions in Central Germany indicate a trend of elevated LST in disturbed areas of Norway spruce (median of LST differences of 4.4 K compared to undisturbed areas). Among topographic site factors, elevation exhibits the highest influence (median of LST differences between disturbed and undisturbed areas 1.2 K higher for highest areas compared to lowest). For pedological site factors, substrate shows the highest effect, modulating the median of LST differences by 2.9 K. Forest disturbance is accompanied by increased LST variance, possibly caused by different post-disturbance forest management practices. Air temperature at 15 cm shows highest agreement with LST and supports variation among management types. Identification of sites with a high risk of elevated LST is crucial for decision making in post-disturbance forest management, successful reforestation, and establishment of resilient forests.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 388-407"},"PeriodicalIF":10.6,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BdFusion: Bi-directional visual-LiDAR fusion for resilient place recognition BdFusion:用于弹性位置识别的双向视觉-激光雷达融合
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-25 DOI: 10.1016/j.isprsjprs.2025.07.022
Anbang Liang , Zhipeng Chen , Wen Xiong , Fanyi Meng , Yu Yin , Dejin Zhang , Qingquan Li
{"title":"BdFusion: Bi-directional visual-LiDAR fusion for resilient place recognition","authors":"Anbang Liang ,&nbsp;Zhipeng Chen ,&nbsp;Wen Xiong ,&nbsp;Fanyi Meng ,&nbsp;Yu Yin ,&nbsp;Dejin Zhang ,&nbsp;Qingquan Li","doi":"10.1016/j.isprsjprs.2025.07.022","DOIUrl":"10.1016/j.isprsjprs.2025.07.022","url":null,"abstract":"<div><div>Place recognition is an essential component for simultaneous localization and mapping (SLAM), as it is widely employed in loop closure detection to mitigate trajectory drifts. However, current works based on images or point clouds data are facing challenges in complex environments: single-modality methods may fail in degraded scenes; while conventional fusion methods simply combine multiple sensors data but ignore the problem that contributions of different features will dynamically change in different scenes. To improve the resilience of place recognition in complex environments, we propose a novel attention-based visual-LiDAR fusion method which is named BdFusion. In this work, a bi-directional attention module is proposed to improve the robustness of feature representation in changing environments, which performs explicit cross-modal feature interaction and enhancement by mining complementary features between 2D images and 3D point clouds. Furthermore, we design a feature fusion network that leverages multi-scale space and channel attention to comprehensively optimize the feature representation and fusion process, so as to learn the complementary advantages of multi modalities and perform adaptive feature fusion. Based on the fused feature, discriminative global descriptor is eventually constructed for place retrieval. We evaluate the proposed method on the self-built complex environment dataset and several public datasets. The experimental results show that our method outperforms existing state-of-the-art models such as PRFusion and AdaFusion on the challenging Szu dataset, achieving +1.6 Average Recall@1 (AR@1) and +0.6 Average Precision (AP), which effectively improves the accuracy and reliability of place recognition in complex environments. The code and dataset are publicly available at <span><span>https://github.com/ThomasLiangAB/BdFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 408-419"},"PeriodicalIF":10.6,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
URSimulator: Human-perception-driven prompt tuning for enhanced virtual urban renewal via diffusion models URSimulator:人类感知驱动的快速调整,通过扩散模型增强虚拟城市更新
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-24 DOI: 10.1016/j.isprsjprs.2025.07.016
Chuanbo Hu , Shan Jia , Xin Li
{"title":"URSimulator: Human-perception-driven prompt tuning for enhanced virtual urban renewal via diffusion models","authors":"Chuanbo Hu ,&nbsp;Shan Jia ,&nbsp;Xin Li","doi":"10.1016/j.isprsjprs.2025.07.016","DOIUrl":"10.1016/j.isprsjprs.2025.07.016","url":null,"abstract":"<div><div>Tackling Urban Physical Disorder (UPD) – such as abandoned buildings, litter, messy vegetation, and graffiti – is essential, as it negatively impacts the safety, well-being, and psychological state of communities. Urban Renewal is the process of revitalizing these neglected and decayed areas within a city to improve their physical environment and quality of life for residents. Effective urban renewal efforts can transform these environments, enhancing their appeal and livability. However, current research lacks simulation tools that can quantitatively assess and visualize the impacts of urban renewal efforts, often relying on subjective judgments. Such simulation tools are essential for planning and implementing effective renewal strategies by providing a clear visualization of potential changes and their impacts. This paper presents a novel framework that addresses this gap by using human perception feedback to simulate the enhancement of street environment. We develop a prompt tuning approach that integrates text-driven Stable Diffusion with human perception feedback. This method iteratively edits local areas of street view images, aligning them more closely with human perceptions of beauty, liveliness, and safety. Our experiments show that this framework significantly improves people’s perceptions of urban environments, with increases of 17.60% in safety, 31.15% in beauty, and 28.82% in liveliness. In comparison, other advanced text-driven image editing methods like DiffEdit only achieve improvements of 2.31% in safety, 11.87% in beauty, and 15.84% in liveliness. We applied this framework across various virtual scenarios, including neighborhood improvement, building redevelopment, green space expansion, and community garden creation. The results demonstrate its effectiveness in simulating urban renewal, offering valuable insights for real-world urban planning and policy-making. This method not only enhances the visual appeal of neglected urban areas but also serves as a powerful tool for city planners and policymakers, ultimately improving urban landscapes and the quality of life for residents.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 356-369"},"PeriodicalIF":10.6,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144695047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UC–Change: a classification-based time series change detection technique for improved forest disturbance mapping using multi-sensor imagery UC-Change:一种基于分类的时间序列变化检测技术,用于使用多传感器图像改进森林干扰制图
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-24 DOI: 10.1016/j.isprsjprs.2025.07.028
Ilia Parshakov , Derek Peddle , Karl Staenz , Jinkai Zhang , Craig Coburn , Howard Cheng
{"title":"UC–Change: a classification-based time series change detection technique for improved forest disturbance mapping using multi-sensor imagery","authors":"Ilia Parshakov ,&nbsp;Derek Peddle ,&nbsp;Karl Staenz ,&nbsp;Jinkai Zhang ,&nbsp;Craig Coburn ,&nbsp;Howard Cheng","doi":"10.1016/j.isprsjprs.2025.07.028","DOIUrl":"10.1016/j.isprsjprs.2025.07.028","url":null,"abstract":"<div><div>Unsupervised Classification to Change (UC–Change) is a versatile technique that detects forest disturbances in satellite images by analyzing changes in the spatial distribution of spectral classes over time. This approach can fully utilize the spectral resolution of individual sensors without requiring atmospheric correction or radiometric normalization. Resulting multisensor capabilities set UC–Change apart from established time-series change detection methods, such as Continuous Change Detection and Classification (CCDC), LandTrendr, Composite2Change (C2C), and Global Forest Change (GFC). With the growing number of Earth observation satellites, the ability to utilize diverse datasets is increasingly important for extracting information relevant to sustainable natural resource management. The algorithm’s effectiveness is demonstrated using a dataset containing 275 Landsat and Sentinel–2 images acquired over a forested area in British Columbia, Canada, from 1972 to 2020. The 100 km × 100 km study site has been actively harvested in recent decades and experienced many wildfires and a mountain pine beetle (MPB) outbreak. The spatio-temporal accuracy of clearcut and fire-scar detection was assessed using the Vegetation Resources Inventory (VRI) and National Burned Area Composite (NBAC) products, respectively, and compared against the C2C 1985 – 2020, CCDC 2002 – 2019, and GFC 2001 – 2022 maps available online. Overall, the UC–Change algorithm detected 85.2 % of the reference VRI 1974 – 2018 cutblock pixels at a temporal resolution of ± 1 year (90.3 % at ± 3 years). It detected 86.0 % of 1985 – 2018 VRI pixels, outperforming C2C (58.8 %). For the period 2002 – 2018, UC–Change mapped 87.1 % of the reference cutblock pixels, exceeding C2C (54.5 %), CCDC (74.0 %), and GFC (70.4 %). UC–Change, C2C, CCDC, and GFC detected 71.0 %, 54.6 %, 37.4 %, and 67.2 % of 2006 – 2018 reference forest fire pixels, respectively. UC–Change provided improved forest harvest and fire-scar detection in areas heavily affected by the MPB outbreak and forests characterized by low canopy cover. It represents a new, fundamentally different approach to time-series analysis, suitable for independent use or in concert with existing methods.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 370-387"},"PeriodicalIF":10.6,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144695048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HATFormer: Height-aware Transformer for multimodal 3D change detection HATFormer:用于多模态3D变化检测的高度感知变压器
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-23 DOI: 10.1016/j.isprsjprs.2025.06.022
Biyuan Liu , Zhou Huang , Yanxi Li , Rongrong Gao , Huai-Xin Chen , Tian-Zhu Xiang
{"title":"HATFormer: Height-aware Transformer for multimodal 3D change detection","authors":"Biyuan Liu ,&nbsp;Zhou Huang ,&nbsp;Yanxi Li ,&nbsp;Rongrong Gao ,&nbsp;Huai-Xin Chen ,&nbsp;Tian-Zhu Xiang","doi":"10.1016/j.isprsjprs.2025.06.022","DOIUrl":"10.1016/j.isprsjprs.2025.06.022","url":null,"abstract":"<div><div>Understanding the three-dimensional dynamics of the Earth’s surface is essential for urban planning and environmental monitoring. In the absence of consistent bitemporal 3D data, recent advancements in change detection have increasingly turned to combining multimodal data sources, including digital surface models (DSMs) and optical remote sensing imagery. However, significant inter-modal differences and intra-class variance — particularly with imbalances between foreground and background classes — continue to pose major challenges for achieving accurate change detection. To address these challenges, we propose a height-aware Transformer network, termed HATFormer, for multimodal semantic and height change detection, which explicitly correlates features across different modalities to reduce modality gaps and incorporates additional background supervision to mitigate foreground-to-background imbalances. Specifically, we first introduce a Background Height Estimation (BHE) module that incorporates height-awareness learning within the background to predict height information directly from lateral image features. This module enhances discriminative background feature learning and reduces the modality gap between monocular images and DSM data. To alleviate the interference of noisy background heights, a Height Uncertainty Suppression (HUS) module is designed to suppress the regions with height uncertainty. Secondly, we propose a Foreground Mask Estimation (FME) module to identify foreground change regions from DSM features, guided by discriminative background features. This module also acts as a regularizer, supporting more effective feature learning within the BHE module. Finally, an Auxiliary Feature Aggregation (AFA) module is designed to integrate features from the FME and BHE modules, which are then decoded by a multi-task decoder to generate precise change predictions. Extensive experiments on the Hi-BCD Plus and SMARS datasets demonstrate that our proposed method outperforms eight state-of-the-art methods, achieving superior performance in semantic and height change detection from multimodal bitemporal data. The code and dataset will be publicly available at: <span><span>https://github.com/HATFormer/HATFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 340-355"},"PeriodicalIF":10.6,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAR ship detection across different spaceborne platforms with confusion-corrected self-training and region-aware alignment framework 基于混淆校正自训练和区域感知对准框架的不同星载平台SAR舰船检测
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-22 DOI: 10.1016/j.isprsjprs.2025.07.017
Shuang Liu , Dong Li , Haibo Song , Caizhi Fan , Ke Li , Jun Wan , Ruining Liu
{"title":"SAR ship detection across different spaceborne platforms with confusion-corrected self-training and region-aware alignment framework","authors":"Shuang Liu ,&nbsp;Dong Li ,&nbsp;Haibo Song ,&nbsp;Caizhi Fan ,&nbsp;Ke Li ,&nbsp;Jun Wan ,&nbsp;Ruining Liu","doi":"10.1016/j.isprsjprs.2025.07.017","DOIUrl":"10.1016/j.isprsjprs.2025.07.017","url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) ship detection is a vital technology for transforming reconnaissance data into actionable intelligence. As spaceborne SAR platforms increase, significant distribution shifts arise among SAR data from different platforms due to diverse imaging conditions and technical parameters. Traditional deep learning detectors, typically optimized for single-platform data, struggle with such shifts and annotation scarcity, limiting cross-platform applicability. Mainstream methods employ unsupervised domain adaptation (UDA) techniques to transfer detectors from a labeled source domain (existing platform data) to a novel unlabeled target domain (new platform data). However, the inherent complexity of SAR images, particularly strong background scattering regions, causes high confusion between ships and non-target regions, making these methods vulnerable to background interference and reducing their effectiveness in cross-platform detection. To alleviate this, we propose a <u>C</u>onfusion-Corrected <u>S</u>elf-Training with <u>R</u>egion-Aware <u>F</u>eature <u>A</u>lignment (CSRFA) framework for cross-platform SAR ship detection. First, a Confusion-corrected Self-training Mechanism (CSM) refines and corrects misclassified proposals to suppress background interference and enhance pseudo-label reliability on unlabeled target domains. Then, a Foreground Guidance Mechanism (FGM) further improves proposal quality by exploiting the consistency between region proposal classification and localization. Finally, a Region-Aware Feature Alignment (RAFA) module aligns ship regions based on RPN-generated foreground probabilities, enabling fine-grained, target-aware domain adaptation. Experiments on GF-3, SEN-1, and HRSID datasets show that CSRFA consistently outperforms existing UDA methods, achieving an average AP improvement of 2% across six cross-platform tasks compared to the second-best approach, demonstrating its robustness and adaptability for practical deployment.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 305-322"},"PeriodicalIF":10.6,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Colour-informed ecoregion analysis highlights a satellite capability gap for spatially and temporally consistent freshwater cyanobacteria monitoring 颜色通知生态区域分析突出了空间和时间一致的淡水蓝藻监测卫星能力差距
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-22 DOI: 10.1016/j.isprsjprs.2025.07.030
Davide Lomeo , Stefan G.H. Simis , Nick Selmes , Anne D. Jungblut , Emma J. Tebbs
{"title":"Colour-informed ecoregion analysis highlights a satellite capability gap for spatially and temporally consistent freshwater cyanobacteria monitoring","authors":"Davide Lomeo ,&nbsp;Stefan G.H. Simis ,&nbsp;Nick Selmes ,&nbsp;Anne D. Jungblut ,&nbsp;Emma J. Tebbs","doi":"10.1016/j.isprsjprs.2025.07.030","DOIUrl":"10.1016/j.isprsjprs.2025.07.030","url":null,"abstract":"<div><div>Cyanobacteria blooms pose significant risks to water quality in freshwater ecosystems worldwide, with implications for human and animal health. Constructing consistent records of cyanobacteria dynamics in complex inland waters from satellite imagery remains challenged by discontinuous sensor capabilities, particularly with regard to spectral coverage. Comparing 11 satellite sensors, we show that the number and positioning of wavebands fundamentally alter bloom detection capability, with wavebands centred at 412, 620, 709, 754 and 779 nm proving most critical for capturing cyanobacteria dynamics. Specifically, analysis of observations from the Medium Resolution Imaging Spectrometer (MERIS) and Ocean and Land Colour Instrument (OLCI), coincident with the Moderate Resolution Imaging Spectroradiometer (MODIS) demonstrates how the spectral band configuration of the latter affects bloom detection. Using an Optical Water Types (OWT) library understood to capture cyanobacterial biomass through varying vertical mixing states, this analysis shows that MODIS can identify optically distinct conditions like surface accumulations but fails to resolve initial bloom evolution in well-mixed conditions, particularly in optically complex regions. Investigation of coherent ecoregions formed using Self-organising Maps trained on OWT membership scores confirm that MODIS captures broad spatial patterns seen with more capable sensors but compresses optical gradients into fewer optical types. These constraints have significant implications for interpreting spatial–temporal dynamics of cyanobacteria in large waterbodies, particularly during 2012–2016 when MERIS and OLCI sensors were absent, and small waterbodies, where high spatial resolution sensors not originally design to study water are used. In addition, these findings underscore the importance of key wavebands in future sensor design and the development of approaches to maintain consistent long-term records across evolving satellite capabilities. Our findings suggest that attempts at quantitatively harmonising cyanobacteria bloom detection across sensors may not be ecologically appropriate unless these observation biases are addressed. For example, analysing the frequency and intensity of surfacing blooms, while considering the meteorological factors that may drive these phenomena, could be considered over decadal timescales, whereas trend analysis of mixed-column biomass should only concern appropriate sensor observation periods.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 323-339"},"PeriodicalIF":10.6,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STFMamba: Spatiotemporal satellite image fusion network based on visual state space model STFMamba:基于视觉状态空间模型的时空卫星图像融合网络
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-22 DOI: 10.1016/j.isprsjprs.2025.07.011
Min Zhao , Xiaolu Jiang , Bo Huang
{"title":"STFMamba: Spatiotemporal satellite image fusion network based on visual state space model","authors":"Min Zhao ,&nbsp;Xiaolu Jiang ,&nbsp;Bo Huang","doi":"10.1016/j.isprsjprs.2025.07.011","DOIUrl":"10.1016/j.isprsjprs.2025.07.011","url":null,"abstract":"<div><div>Remote sensing images provide extensive information about Earth’s surface, supporting a wide range of applications. Individual sensors often encounter a trade-off between spatial and temporal resolutions, spatiotemporal fusion (STF) aims to overcome this shortcoming by combining multisource data. Existing deep learning-based STF methods struggle with capturing long-range dependencies (CNN-based) or incur high computational cost (Transformer-based). To overcome these limitations, we propose STFMamba, a two-step state space model that effectively captures global information while maintaining linear complexity. Specifically, a super-resolution (SR) network is firstly utilized to mitigate sensor heterogeneity of multisource data, then a dual U-Net is designed to fully leverage spatio-temporal correlations and capture temporal variations. Our STFMamba contains the following three key components: 1) the multidimensional scanning mechanism for global relationship modeling to eliminate information loss, 2) a spatio-spectral–temporal fusion scanning strategy to integrate multiscale contextual features, and 3) a multi-head cross-attention module for adaptive selection and fusion. Additionally, we develop a lightweight version of STFMamba for deployment on resource-constrained devices, incorporating a knowledge distillation strategy to align its features with the base model and enhance performance. Extensive experiments on three benchmark datasets demonstrate the superiority of the proposed method. Specifically, our method outperforms compared methods, including FSDAF, FVSDF, EDCSTFN, GANSTFM, SwinSTFM, and DDPMSTF, with average RMSE reductions of 24.25%, 25.94%, 18.15%, 14.36%, 9.63%, and 12.82%, respectively. Our code is available at: <span><span>https://github.com/zhaomin0101/STFMamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 288-304"},"PeriodicalIF":10.6,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mind the modality gap: Towards a remote sensing vision-language model via cross-modal alignment 注意模态差距:通过跨模态对齐实现遥感视觉语言模型
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-21 DOI: 10.1016/j.isprsjprs.2025.06.019
Angelos Zavras , Dimitrios Michail , Begüm Demir , Ioannis Papoutsis
{"title":"Mind the modality gap: Towards a remote sensing vision-language model via cross-modal alignment","authors":"Angelos Zavras ,&nbsp;Dimitrios Michail ,&nbsp;Begüm Demir ,&nbsp;Ioannis Papoutsis","doi":"10.1016/j.isprsjprs.2025.06.019","DOIUrl":"10.1016/j.isprsjprs.2025.06.019","url":null,"abstract":"&lt;div&gt;&lt;div&gt;Deep Learning (DL) is undergoing a paradigm shift with the emergence of foundation models. In this work, we focus on Contrastive Language-Image Pre-training (CLIP), a Vision-Language foundation model that achieves high accuracy across various image classification tasks and often rivals fully supervised baselines, despite not being explicitly trained for those tasks. Nevertheless, there are still domains where zero-shot CLIP performance is far from optimal, such as Remote Sensing (RS) and medical imagery. These domains do not only exhibit fundamentally different distributions compared to natural images, but also commonly rely on complementary modalities, beyond RGB, to derive meaningful insights. To this end, we propose a methodology to align distinct RS image modalities with the visual and textual modalities of CLIP. Our two-stage procedure addresses the aforementioned distribution shift, extends the zero-shot capabilities of CLIP and enriches CLIP’s shared embedding space with domain-specific knowledge. Initially, we robustly fine-tune CLIP according to the PAINT (Ilharco et al., 2022) patching protocol, in order to deal with the distribution shift. Building upon this foundation, we facilitate the cross-modal alignment of a RS modality encoder by distilling knowledge from the CLIP visual and textual encoders. We empirically show that both patching and cross-modal alignment translate to significant performance gains, across several RS imagery classification and cross-modal retrieval benchmark datasets. Patching dramatically improves RS imagery (RGB) classification (BigEarthNet-5: +39.76% mAP, BigEarthNet-19: +56.86% mAP, BigEarthNet-43: +28.43% mAP, SEN12MS: +20.61% mAP, EuroSAT: +5.98% Acc), while it maintains performance on the representative supported task (ImageNet), and most critically it outperforms existing RS-specialized CLIP variants such as RemoteCLIP (Liu et al., 2023a) and SkyCLIP (Wang et al., 2024). Cross-modal alignment extends zero-shot capabilities to multi-spectral data, surpassing our patched CLIP classification performance and establishing strong cross-modal retrieval baselines. Linear probing further confirms the quality of learned representations of our aligned multi-spectral encoder, outperforming existing RS foundation models such as SatMAE (Cong et al., 2022). Notably, these enhancements are achieved without the reliance on textual descriptions, without introducing any task-specific parameters, without training from scratch and without catastrophic forgetting. Our work highlights the potential of leveraging existing VLMs’ large-scale pre-training and extending their zero-shot capabilities to specialized fields, paving the way for resource efficient establishment of in-domain multi-modal foundation models in RS and beyond. We make our code implementation and weights for all experiments publicly available on our project’s GitHub repository &lt;span&gt;&lt;span&gt;https://github.com/Orion-AI-Lab/MindTheModalityGap&lt;/span&gt;&lt;svg&gt;&lt;","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 270-287"},"PeriodicalIF":10.6,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144670817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Three-dimensional reconstruction of shallow seabed topographic surface based on fusion of side-scan sonar and echo sounding data 基于侧扫声纳与回波测深数据融合的浅海地形面三维重建
IF 10.6 1区 地球科学
ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-07-21 DOI: 10.1016/j.isprsjprs.2025.07.018
Chunqing Ran , Luotao Zhang , Shuo Han , Xiaobo Zhang , Shengli Wang , Xinghua Zhou
{"title":"Three-dimensional reconstruction of shallow seabed topographic surface based on fusion of side-scan sonar and echo sounding data","authors":"Chunqing Ran ,&nbsp;Luotao Zhang ,&nbsp;Shuo Han ,&nbsp;Xiaobo Zhang ,&nbsp;Shengli Wang ,&nbsp;Xinghua Zhou","doi":"10.1016/j.isprsjprs.2025.07.018","DOIUrl":"10.1016/j.isprsjprs.2025.07.018","url":null,"abstract":"<div><div>High-precision topographic mapping of offshore shallow seabed has great significance in a number of fields, including shipping navigation, disaster warning, environmental monitoring and resource management. However, conventional side-scan sonar (SSS) techniques are difficult to obtain seabed elevation data, which limits their application in the field of three-dimensional (3D) topographic reconstruction. Meanwhile, although single-beam echo sounder (SBES) can provide accurate depth information, it is difficult to capture the details of complex terrain due to sparse spatial coverage. In order to overcome the limitations of a single technique in 3D seafloor topographic reconstruction applications, this study fuses SSS and SBES data, and proposes the Multi-Scale Gradient Fusion Shape From Shading (MSGF-SFS) algorithm. This algorithm extracts and fuses surface gradient information by analyzing the intensity variations in SSS images at multiple scales. This enables the construction of a 3D discrete elevation model from two-dimensional (2D) SSS data. In order to reduce the inherent elevation error of SSS, the topographic feature extraction and least squares optimization for multi-source data alignment and correction algorithm is introduced, which combines terrain feature extraction and least squares optimization to fuse the SBES depth data with the 3D discrete elevation model for calibration. The quality of the 3D discrete elevation model was then optimized by the data filtering based on quadtree domain partitioning and least squares function. Finally, a high-resolution 3D continuous seabed model was constructed on the basis of the filtered data using implicit function based on Undirected Distance Function (IF-UDF) deep learning algorithm. Based on the above methods, this study realized the 3D seabed topography reconstruction of an offshore area in the Yellow Sea of China and conducted comparative experiments. The findings demonstrate that a series of methods in this paper can effectively reconstruct a fine 3D seabed model, and the obtained model is better than the existing 3D reconstruction techniques in terms of normal consistency and continuity, and shows stronger robustness and higher accuracy than the traditional algorithms. This method provides a systematic and practical solution for high-resolution offshore topographic mapping, especially for high-precision requirements in complex environments, and can effectively serve as an alternative to multibeam systems in the field of offshore topography mapping.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 249-269"},"PeriodicalIF":10.6,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144670816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信