Yuehong Chen , Jiayue Zhou , Congcong Xu , Qiang Ma , Xiaoxiang Zhang , Ya’nan Zhou , Yong Ge
{"title":"Structure-aware deep learning network for building height estimation","authors":"Yuehong Chen , Jiayue Zhou , Congcong Xu , Qiang Ma , Xiaoxiang Zhang , Ya’nan Zhou , Yong Ge","doi":"10.1016/j.jag.2025.104443","DOIUrl":"10.1016/j.jag.2025.104443","url":null,"abstract":"<div><div>Accurate building height information is essential for urban management and planning. However, most existing methods rely on general segmentation networks for building height estimation, often ignoring the structural characteristics of buildings. This paper proposes a novel structure-aware building height estimation (SBHE) model to address this limitation. The model is designed as a dual-branch architecture: one branch extracts building footprints from Sentinel-2 imagery, while the other estimates building heights from Sentinel-1 imagery. A structure-aware decoder and a gating mechanism are developed to integrate into SBHE to capture and account for the structural characteristics of buildings. Validation conducted in the Yangtze River Delta region of China demonstrates that SBHE achieved a more accurate building height map (RMSE = 4.62 m) than four existing methods (RMSE = 5.071 m, 7.148 m, RMSE = 10.16 m, and 13.41 m). Meanwhile, SBHE generated clearer building contours and better structural completeness. Thus, the proposed SBHE offers a robust tool for building height mapping. The source code of SBHE model can be available at: <span><span>https://github.com/cheneason/SBHE-model</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"137 ","pages":"Article 104443"},"PeriodicalIF":7.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessment of forest fire vulnerability prediction in Indonesia: Seasonal variability analysis using machine learning techniques","authors":"Wulan Salle Karurung , Kangjae Lee , Wonhee Lee","doi":"10.1016/j.jag.2025.104435","DOIUrl":"10.1016/j.jag.2025.104435","url":null,"abstract":"<div><div>Forest fires significantly threaten Indonesia’s tropical forests, driven by complex interactions between human activity, environmental conditions and climate variability. This research aims to identify and analyze the factors influencing forest fires in Kalimantan, Sumatra, and Papua during the rainy, dry and all-season conditions using machine learning techniques and create vulnerability prediction maps and categorize risk zones. Eight years (2015–2022) of forest fire data were combined with 15 forest fire susceptible factors that consider of human, environmental, meteorological, and land use/land cover conditioning factors. Random forest (RF) and eXtreme Gradient Boosting (XGB) machine learning models were used to train and validate the dataset through hyperparameter tuning and 10-fold cross-validation for accuracy assessment. The XGB model was selected as the best performer based on accuracy, recall, and F1-score and was used to generate probability values. The evaluation showed that the accuracies and AUC values for the nine models were greater than 0.7, with AUC values ranging from 0.71 to 0.95, indicating good performance. Papua had the highest accuracy, with 90.5%, 91.6%, and 92.5% for all, rainy, and dry seasons, respectively. Population density, elevation, precipitation, soil moisture, NDMI, NDVI, distance from roads and settlements, land surface temperature and peatlands are the key contributing factors of forest fire occurrences. Vulnerability maps categorized into five risk zones, identifying high-risk areas that aligned with observed fire occurrences. This research highlighted the diverse characteristics of factors that determine forest fires and examined their impact on fire occurrences. The findings provide actionable insights for targeted fire management strategies, though future research should incorporate additional variables to improve predictive accuracy and address long-term environmental changes.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"138 ","pages":"Article 104435"},"PeriodicalIF":7.6,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143511582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanyun Wang , Wenke Li , Huixin Fan , Song Ji , Chenguang Dai , Yongsheng Zhang , Jin Chen , Yulan Guo , Longguang Wang
{"title":"PhaseVSRnet: Deep complex network for phase-based satellite video super-resolution","authors":"Hanyun Wang , Wenke Li , Huixin Fan , Song Ji , Chenguang Dai , Yongsheng Zhang , Jin Chen , Yulan Guo , Longguang Wang","doi":"10.1016/j.jag.2025.104418","DOIUrl":"10.1016/j.jag.2025.104418","url":null,"abstract":"<div><div>Satellite video super-resolution (SR) aims to generate high-resolution (HR) frames from multiple low-resolution (LR) frames. To exploit motion cues under complicated motion patterns, most CNN-based methods first perform motion compensation and then aggregate motion cues in aligned frames (features). However, due to the low spatial resolution of satellite videos, the moving scales are usually subtle and difficult to be captured in the spatial domain. Furthermore, various scales of moving objects challenge current satellite video SR methods in motion estimation and compensation. To address these challenges for satellite video SR, we propose PhaseVSRnet to convert satellite video frames into the phase domain. By representing the motion information with phase shifts, the subtle motions are enlarged in the phase domain. Specifically, our PhaseVSRnet employs deep complex convolutions to better exploit the inherent correlation of complex-valued decompositions obtained by complex-valued steerable pyramids. Then, we adopt a coarse-to-fine motion compensation mechanism to eliminate phase ambiguity at different levels. Finally, in hierarchical reconstruction stage, we use the multi-scale fusion module to aggregate features from multiple levels and use an upsampling layer to upsample the feature maps for resolution enhancement. With PhaseVSRnet, we effectively address the subtle motions and varying scales of moving objects in satellite videos. We assess its performance on a satellite video SR dataset from Jilin-1 satellites and evaluate its generalization ability on another SR dataset from OVS-1 satellites. The results show that PhaseVSRnet effectively captures motion cues in the phase domain and exhibits strong generalization capability across different satellite sensors in unseen scenarios.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"138 ","pages":"Article 104418"},"PeriodicalIF":7.6,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143511583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Bao , Mohamed I. Abdelaal , Mohamed Saleh , Mimoun Chourak , Makkaoui Mohamed , Mengdao Xing
{"title":"Unlocking the hidden secrets of the 2023 Al Haouz earthquake: Coseismic model reveals intraplate reverse faulting in Morocco derived from SAR and seismic data","authors":"Min Bao , Mohamed I. Abdelaal , Mohamed Saleh , Mimoun Chourak , Makkaoui Mohamed , Mengdao Xing","doi":"10.1016/j.jag.2025.104420","DOIUrl":"10.1016/j.jag.2025.104420","url":null,"abstract":"<div><div>The 2023 Mw 6.8 Al Haouz earthquake struck Morocco’s Atlas Mountains on September 8, causing over 3000 fatalities and extensive damage, revealing hidden seismic hazards in this slowly deforming region. Despite its impact, Al Haouz earthquake has received limited scientific investigation. The absence of surface rupture, its occurrence in an intraplate seismic silence zone, and ambiguous focal mechanisms have hindered understanding of the fault’s kinematics. To address these gaps, our study employs the Interferometric Synthetic Aperture Radar (InSAR) technique to refine the coseismic deformation. We further propose two fault-dipping scenarios, northward and southward, reinforced by a unique local seismic dataset to evaluate the fault rupture characterization. Additionally, stress change analysis assessed the stress transfer effects between the mainshock and aftershocks, culminating in a comprehensive geodynamic model. Our findings reveal a northward-dipping reverse fault with a strike of <span><math><mrow><mn>249</mn><mo>.</mo><msup><mrow><mn>8</mn></mrow><mrow><mo>∘</mo></mrow></msup></mrow></math></span>, a dip of 66°, and a rake of 55°, exhibiting a maximum slip of 1.75 m. Stress change analysis demonstrates that stress transfer from the mainshock reactivated pre-existing faults, particularly the Tizi n’Test fault system, triggering shallow aftershocks in high-stress zones. We suggest that mantle upwelling, coupled with fluid injection along pre-existing faults, drives seismic dynamics in the region. The Tizi n’Test fault likely extends to the lithosphere–asthenosphere boundary, where active upwelling facilitates magma fluid intrusion, stimulating seismic activity. These findings are consistent with recent research, providing deeper insights into fault mechanics in the Atlas Mountains. They also highlight the significant contribution of satellite-based SAR techniques in uncovering hidden seismic hazards.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"137 ","pages":"Article 104420"},"PeriodicalIF":7.6,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143488782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kang He , Jie Dong , Haozheng Ma , Yujie Cai , Ruyi Feng , Yusen Dong , Lizhe Wang
{"title":"Remote sensing image interpretation of geological lithology via a sensitive feature self-aggregation deep fusion network","authors":"Kang He , Jie Dong , Haozheng Ma , Yujie Cai , Ruyi Feng , Yusen Dong , Lizhe Wang","doi":"10.1016/j.jag.2025.104384","DOIUrl":"10.1016/j.jag.2025.104384","url":null,"abstract":"<div><div>Geological lithological interpretation is a key focus in Earth observation research, with applications in resource surveys, geological mapping, and environmental monitoring. Although deep learning (DL) methods has significantly improved the performance of lithological remote sensing interpretation, its accuracy remains far below the level achieved by visual interpretation performed by domain experts. This disparity is primarily due to the heavy reliance of current intelligent lithological interpretation methods on remote sensing imagery (RSI), coupled with insufficient exploration of sensitive features (SF) and prior knowledge (PK), resulting in low interpretation precision. Furthermore, multi-modal SF and PK exhibit significant spatiotemporal heterogeneity, which hinders their direct integration into DL networks. In this work, we propose the sensitive feature self-aggregation deep fusion network (SFA-DFNet). Inspired by the visual interpretation practices of domain experts, we selected the five most commonly used SF and one type of PK as multi-modal supplementary information. To address the spatiotemporal heterogeneity of SF and PK, we designed a self-aggregation mechanism (SA-Mechanism) that dynamically selects and optimizes beneficial information from multi-modal features for lithological interpretation. This mechanism has broad applicability and can be extended to support any number of modal data. Additionally, we introduced the cross-modal feature interaction fusion module (CM-FIFM), which enhances the effective exchange and fusion of RSI, SF, and PK by leveraging long-range contextual information. Experimental results on two datasets demonstrate that differences in lithological genesis and types are critical factors affecting interpretation accuracy. Compared with seven SOTA DL models, our method achieves more than a 3% improvement in mIoU, showcasing its effectiveness and robustness.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"137 ","pages":"Article 104384"},"PeriodicalIF":7.6,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143488781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A SAM-adapted weakly-supervised semantic segmentation method constrained by uncertainty and transformation consistency","authors":"Yinxia Cao , Xin Huang , Qihao Weng","doi":"10.1016/j.jag.2025.104440","DOIUrl":"10.1016/j.jag.2025.104440","url":null,"abstract":"<div><div>Semantic segmentation of remote sensing imagery is a fundamental task to generate pixel-wise category maps. Existing deep learning networks rely heavily on dense pixel-wise labels, incurring high acquisition costs. Given this challenge, this study introduces sparse point labels, a type of cost-effective weak labels, for semantic segmentation. Existing weakly-supervised methods often leverage low-level visual or high-level semantic features from networks to generate supervision information for unlabeled pixels, which can easily lead to the issue of label noises. Furthermore, these methods rarely explore the general-purpose foundation model, segment anything model (SAM), with strong zero-shot generalization capacity in image segmentation. In this paper, we proposed a SAM-adapted weakly-supervised method with three components: 1) an adapted EfficientViT-SAM network (AESAM) for semantic segmentation guided by point labels, 2) an uncertainty-based pseudo-label generation module to select reliable pseudo-labels for supervising unlabeled pixels, and 3) a transformation consistency constraint for enhancing AESAM’s robustness to data perturbations. The proposed method was tested on the ISPRS Vaihingen dataset (collected from airplane), the Zurich Summer dataset (satellite), and the UAVid dataset (drone). Results demonstrated a significant improvement in mean F1 (by 5.89 %–10.56 %) and mean IoU (by 5.95 %–11.13 %) compared to the baseline method. Compared to the closest competitors, there was an increase in mean F1 (by 0.83 %–5.29 %) and mean IoU (by 1.04 %–6.54 %). Furthermore, our approach requires only fine-tuning a small number of parameters (0.9 M) using cheap point labels, making it promising for scenarios with limited labeling budgets. The code is available at <span><span>https://github.com/lauraset/SAM-UTC-WSSS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"137 ","pages":"Article 104440"},"PeriodicalIF":7.6,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143478958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Density uncertainty quantification with NeRF-Ensembles: Impact of data and scene constraints","authors":"Miriam Jäger, Steven Landgraf, Boris Jutzi","doi":"10.1016/j.jag.2025.104406","DOIUrl":"10.1016/j.jag.2025.104406","url":null,"abstract":"<div><div>In the fields of computer graphics, computer vision and photogrammetry, Neural Radiance Fields (NeRFs) are a major topic driving current research and development. However, the quality of NeRF-generated 3D scene reconstructions and subsequent surface reconstructions, heavily relies on the network output, particularly the density. Regarding this critical aspect, we propose to utilize NeRF-Ensembles that provide a density uncertainty estimate alongside the mean density. We demonstrate that data constraints such as low-quality images and poses lead to a degradation of the rendering quality, increased density uncertainty and decreased predicted density. Even with high-quality input data, the density uncertainty varies based on scene constraints such as acquisition constellations, occlusions and material properties. NeRF-Ensembles not only provide a tool for quantifying the uncertainty but exhibit two promising advantages: Enhanced robustness and artifact removal. Through the mean densities, small outliers are removed, yielding a smoother output with improved completeness. Furthermore, applying a density uncertainty-guided artifact removal in post-processing proves effective for the separation of object and artifact areas. We conduct our methodology on 3 different datasets: (i) synthetic benchmark dataset, (ii) real benchmark dataset, (iii) real data under realistic recording conditions and sensors.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"137 ","pages":"Article 104406"},"PeriodicalIF":7.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143478957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weijia Li , Jinhua Yu , Dairong Chen , Yi Lin , Runmin Dong , Xiang Zhang , Conghui He , Haohuan Fu
{"title":"Fine-grained building function recognition with street-view images and GIS map data via geometry-aware semi-supervised learning","authors":"Weijia Li , Jinhua Yu , Dairong Chen , Yi Lin , Runmin Dong , Xiang Zhang , Conghui He , Haohuan Fu","doi":"10.1016/j.jag.2025.104386","DOIUrl":"10.1016/j.jag.2025.104386","url":null,"abstract":"<div><div>The diversity of building functions is vital for urban planning and optimizing infrastructure and services. Street-view images offer rich exterior details, aiding in function recognition. However, street-view building function annotations are limited and challenging to obtain. In this work, we propose a geometry-aware semi-supervised method for fine-grained building function recognition, which effectively uses multi-source geoinformation data to achieve accurate function recognition in both single-city and cross-city scenarios. We restructured the semi-supervised method based on the Teacher–Student architecture into three stages, which involve pre-training for building facade recognition, building function annotation generation, and building function recognition. In the first stage, to enable semi-supervised training with limited annotations, we employ a semi-supervised object detection model, which trains on both labeled samples and a large amount of unlabeled data simultaneously, achieving building facade detection. In the second stage, to further optimize the pseudo-labels, we effectively utilize the geometric spatial relationships between GIS map data and panoramic street-view images, integrating the building function information with facade detection results. We ultimately achieve fine-grained building function recognition in both single-city and cross-city scenarios by combining the coarse annotations and labeled data in the final stage. We conduct extensive comparative experiments on four datasets, which include OmniCity, Madrid, Los Angeles, and Boston, to evaluate the performance of our method in both single-city (OmniCity & Madrid) and cross-city (OmniCity - Los Angeles & OmniCity - Boston) scenarios. The experimental results show that, compared to advanced recognition methods, our method improves mAP by at least 4.8% and 4.3% for OmniCity and Madrid, respectively, while also effectively handling class imbalance. Furthermore, our method performs well in the cross-categorization system experiments for Los Angeles and Boston, highlighting its strong potential for cross-city tasks. This study offers a new solution for large-scale and multi-city applications by efficiently utilizing multi-source geoinformation data, enhancing urban information acquisition efficiency, and assisting in rational resource allocation.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"137 ","pages":"Article 104386"},"PeriodicalIF":7.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143474903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integrated graph-spatial method for high-performance geospatial-temporal semantic query","authors":"Zichen Yue , Wei Zhu , Xin Mei , Shaobo Zhong","doi":"10.1016/j.jag.2025.104437","DOIUrl":"10.1016/j.jag.2025.104437","url":null,"abstract":"<div><div>Knowledge graphs (KGs) have gained significant attention in the GIS community as a cutting-edge technology for linking heterogeneous and multimodal data sources. However, the efficiency of semantic querying of geospatial-temporal data in KGs remains a challenge. Graph databases excel at handling complex semantic associations but exhibit low efficiency in geospatial analysis tasks, such as topological analysis and geographic calculations, while relational databases excel at geospatial data storage and computation but struggle to efficiently process association analysis. To address this issue, we propose GraST, a geospatial-temporal semantic query optimization method that integrates property graphs and relational databases. GraST stores complete geospatial-temporal objects in a relational database (using built-in or extended spatial data engines), and employs spatiotemporal partitioning and indexing to enhance query efficiency. Simultaneously, GraST stores lightweight geospatial-temporal nodes in the graph database and links them to multi-granularity time tree and Geohash encoding nodes to enhance spatiotemporal aggregation capabilities. During query processing, user queries are broken down into graph semantic searches and geospatial calculations, pushed down to the graph and relational database for execution. Additionally, GraST adopts the two-phase commit protocol for cross-database data synchronization. We implemented a GraST prototype system by integrating PostGIS and Neo4j, and conducted performance evaluations and case studies on large-scale real-world datasets. Experimental results demonstrate that GraST shortens query response times by 1–2 orders of magnitude and offers flexible support for diverse geospatial-temporal semantic queries.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"137 ","pages":"Article 104437"},"PeriodicalIF":7.6,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaolei Qin , Haonan Guo , Xin Su , Zhenghui Zhao , Di Wang , Liangpei Zhang
{"title":"Spatiotemporal masked pre-training for advancing crop mapping on satellite image time series with limited labels","authors":"Xiaolei Qin , Haonan Guo , Xin Su , Zhenghui Zhao , Di Wang , Liangpei Zhang","doi":"10.1016/j.jag.2025.104426","DOIUrl":"10.1016/j.jag.2025.104426","url":null,"abstract":"<div><div>Accurate crop mapping plays a critical role in optimizing agricultural monitoring and ensuring food security. Although data-driven deep learning methods have demonstrated success in crop mapping with satellite image time series (SITS) data, their promising performances heavily depend on labeled training samples. Nevertheless, the difficulty of annotating crop types often results in labeled data scarcity, leading to a decline in the model’s performance. Self-supervised learning (SSL) is a novel technique for crop mapping with limited labels. However, the existing SSL methods applied to SITS data typically explore masking solely on temporal dimension, which cannot guarantee strong spatial representation and therefore hinders the accurate prediction of complex crop fields. Furthermore, these methods sequentially extract spatial and temporal information without fully integrating information across different dimensions. In this study, we propose a spatiotemporal masking strategy for pre-training a SpatioTemporal Collaborative Learning Network (STCLN) to extract informative spatial and temporal representations from SITS data. Additionally, we design a SpatioTemporal Attention (STA) module in STCLN that integrates representations from spatial and temporal dimensions. The experimental results on two crop type mapping benchmarks encompassing various crop types demonstrate the outperformance of our proposed method. STCLN_wp outperforms the previous state-of-the-art (SOTA) methods with 6.49% higher mIoU on PASTIS dataset and 4.04% higher mIoU on MTLCC dataset. The ablation experiments on pre-training, masking strategies, and the STA module validate the effectiveness of our methodological design. Additionally, experiments conducted under varying sizes of the training set highlight the superior generalization ability of our method for crop type mapping in label-scarce situations. The code of our method is available at <span><span>https://github.com/XiaoleiQinn/STCLN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"137 ","pages":"Article 104426"},"PeriodicalIF":7.6,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}