{"title":"Using difference features effectively: A multi-task network for exploring change areas and change moments in time series remote sensing images","authors":"Jialu Li, Chen Wu","doi":"10.1016/j.isprsjprs.2024.09.029","DOIUrl":"10.1016/j.isprsjprs.2024.09.029","url":null,"abstract":"<div><div>With the rapid advancement in remote sensing Earth observation technology, an abundance of Time Series multispectral remote sensing Images (TSIs) from platforms like Landsat and Sentinel-2 are now accessible, offering essential data support for Time Series remote sensing images Change Detection (TSCD). However, TSCD faces misalignment challenges due to variations in radiation incidence angles, satellite orbit deviations, and other factors when capturing TSIs at the same geographic location but different times. Furthermore, another important issue that needs immediate attention is the precise determination of change moments for change areas within TSIs. To tackle these challenges, this paper proposes Multi-RLD-Net, a multi-task network that efficiently utilizes difference features to explore change areas and corresponding change moments in TSIs. To the best of our knowledge, this is the first time that using deep learning for identifying change moments in TSIs. Multi-RLD-Net integrates Optical Flow with Long Short-Term Memory (LSTM) to derive differences between TSIs. Initially, a lightweight encoder is introduced to extract multi-scale spatial features, which maximally preserve original features through a siamese structure. Subsequently, shallow spatial features extracted by the encoder are input into the novel Recursive Optical Flow Difference (ROD) module to align input features and detect differences between them, while deep spatial features extracted by the encoder are input into LSTM to capture long-term temporal dependencies and differences between hidden states. Both branches output differences among TSIs, enhancing the expressive capacity of the model. Finally, the decoder identifies change areas and their corresponding change moments using multi-task branches. Experiments on UTRNet dataset and DynamicEarthNet dataset demonstrate that proposed RLD-Net and Multi-RLD-Net outperform representative approaches, achieving F1 value improvements of 1.29% and 10.42% compared to the state-of-the art method MC<sup>2</sup>ABNet. The source code will be available soon at <span><span>https://github.com/lijialu144/Multi-RLD-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 487-505"},"PeriodicalIF":10.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaojun Chen , Huaiqing Zhang , Meng Zhang , Yehong Wu , Yang Liu
{"title":"Mangrove mapping in China using Gaussian mixture model with a novel mangrove index (SSMI) derived from optical and SAR imagery","authors":"Zhaojun Chen , Huaiqing Zhang , Meng Zhang , Yehong Wu , Yang Liu","doi":"10.1016/j.isprsjprs.2024.09.026","DOIUrl":"10.1016/j.isprsjprs.2024.09.026","url":null,"abstract":"<div><div>As an important shoreline vegetation and highly productive ecosystem, mangroves play an essential role in the protection of coastlines and ecological diversity. Accurate mapping of the spatial distribution of mangroves is crucial for the protection and restoration of mangrove ecosystems. Supervised classification methods rely on large sample sets and complex classifiers and traditional thresholding methods that require empirical thresholds, given the problems that limit the feasibility and stability of existing mangrove identification and mapping methods on large scales. Thus, this paper develops a novel mangrove index (spectral and SAR mangrove index, SSMI) and Gaussian mixture model (GMM) mangrove mapping method, which does not require training samples and can automatically and accurately map mangrove boundaries by utilizing only single-scene Sentinel-1 and single-scene Sentinel-2 images from the same time period. The SSMI capitalizes on the fact that mangroves are differentiated from other land cover types in terms of optical characteristics (greenness and moisture) and backscattering coefficients of SAR images and ultimately highlights mangrove forest information through the product of three expressions (<em>f</em>(<em>S</em>) = red egde/SWIR1, <em>f</em>(<em>B</em>) = 1/(1 + e<sup>-VH</sup>), <em>f</em>(<em>W</em>)=(NIR-SWIR1)/(NIR+SWIR1)). The proposed SSMI was tested in six typical mangrove distribution areas in China where climatic conditions and mangrove species vary widely. The results indicated that the SSMI was more capable of mapping mangrove forests than the other mangrove indices (CMRI, NDMI, MVI, and MI), with overall accuracys (OA) higher than 0.90 and F1 scores as high as 0.93 for the other five areas except for the Maowei Gulf (S5). Moreover, the mangrove maps generated by the SSMI were highly consistent with the reference maps (HGMF_2020、LASAC_2018 and IMMA). In addition, the SSMI achieves stable performance, as shown by the mapping results of the other two classification methods (K-means and Otsu’s algorithm). Mangrove mapping in six typical mangrove distribution areas in China for five consecutive years (2019–2023) and experiments in three Southeast Asian countries with major mangrove distributions (Thailand, Vietnam, and Indonesia) demonstrated that the SSMIs constructed in this paper are highly stable across time and space. The SSMI proposed in this paper does not require reference samples or predefined parameters; thus, it has great flexibility and applicability in mapping mangroves on a large scale, especially in cloudy areas.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 466-486"},"PeriodicalIF":10.6,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helena Calatrava , Bhavya Duvvuri , Haoqing Li , Ricardo Borsoi , Edward Beighley , Deniz Erdoğmuş , Pau Closas , Tales Imbiriba
{"title":"Recursive classification of satellite imaging time-series: An application to land cover mapping","authors":"Helena Calatrava , Bhavya Duvvuri , Haoqing Li , Ricardo Borsoi , Edward Beighley , Deniz Erdoğmuş , Pau Closas , Tales Imbiriba","doi":"10.1016/j.isprsjprs.2024.09.003","DOIUrl":"10.1016/j.isprsjprs.2024.09.003","url":null,"abstract":"<div><div>Despite the extensive body of literature focused on remote sensing applications for land cover mapping and the availability of high-resolution satellite imagery, methods for continuously updating classification maps in real-time remain limited, especially when training data is scarce. This paper introduces the recursive Bayesian classifier (RBC), which converts any instantaneous classifier into a robust online method through a probabilistic framework that is resilient to non-informative image variations. Three experiments are conducted using Sentinel-2 data: water mapping of the Oroville Dam in California and the Charles River basin in Massachusetts, and deforestation detection in the Amazon. RBC is applied to a Gaussian mixture model (GMM), logistic regression (LR), and our proposed spectral index classifier (SIC). Results show that RBC significantly enhances classifier robustness in multitemporal settings under challenging conditions, such as cloud cover and cyanobacterial blooms. Specifically, balanced classification accuracy improves by up to 26.95% for SIC, 12.4% for GMM, and 13.81% for LR in water mapping, and by 15.25%, 14.17%, and 14.7% in deforestation detection. Moreover, without additional training data, RBC improves the performance of the state-of-the-art DeepWaterMap and WatNet algorithms by up to 9.62% and 11.03%. These benefits are provided by RBC while requiring minimal supervision and maintaining a low computational cost that remains constant for each time step regardless of the time-series length.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 447-465"},"PeriodicalIF":10.6,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142327791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paulo Silva Filho , Claudio Persello , Raian V. Maretto , Renato Machado
{"title":"Mapping the Brazilian savanna’s natural vegetation: A SAR-optical uncertainty-aware deep learning approach","authors":"Paulo Silva Filho , Claudio Persello , Raian V. Maretto , Renato Machado","doi":"10.1016/j.isprsjprs.2024.09.019","DOIUrl":"10.1016/j.isprsjprs.2024.09.019","url":null,"abstract":"<div><div>The Brazilian savanna (Cerrado) is considered a hotspot for conservation. Despite its environmental and social importance, the biome has suffered a rapid transformation process due to human activities. Mapping and monitoring the remaining vegetation is essential to guide public policies for biodiversity conservation. However, accurately mapping the Cerrado’s vegetation is still an open challenge. Its diverse but spectrally similar physiognomies are a source of confusion for state-of-the-art (SOTA) methods. This study proposes a deep learning model to map the natural vegetation of the Cerrado at the regional to biome level, fusing Synthetic Aperture Radar (SAR) and optical data. The proposed model is designed to deal with uncertainties caused by the different resolutions of the input Sentinel-1/2 images (10 m) and the reference data, derived from Landsat images (30 m). We designed a multi-resolution label-propagation (MRLP) module that infers maps at both resolutions and uses the class scores from the 30 m output as features for the 10 m classification layer. We train the model with the proposed calibrated dual focal loss function in a 2-stage hierarchical manner. Our results reached an overall accuracy of 70.37%, representing an increase of 15.64% compared to a SOTA random forest (RF) model. Moreover, we propose an uncertainty quantification method, which has shown to be useful not only in validating the model, but also in highlighting areas of label noise in the reference. The developed codes and dataset are available on <span><span>Github</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 405-421"},"PeriodicalIF":10.6,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haiming Zhang , Guorui Ma , Hongyang Fan , Hongyu Gong , Di Wang , Yongxian Zhang
{"title":"SDCINet: A novel cross-task integration network for segmentation and detection of damaged/changed building targets with optical remote sensing imagery","authors":"Haiming Zhang , Guorui Ma , Hongyang Fan , Hongyu Gong , Di Wang , Yongxian Zhang","doi":"10.1016/j.isprsjprs.2024.09.024","DOIUrl":"10.1016/j.isprsjprs.2024.09.024","url":null,"abstract":"<div><div>Buildings are primary locations for human activities and key focuses in the military domain. Rapidly detecting damaged/changed buildings (DCB) and conducting detailed assessments can effectively aid urbanization monitoring, disaster response, and humanitarian assistance. Currently, the tasks of object detection (OD) and change detection (CD) for DCB are almost independent of each other, making it difficult to simultaneously determine the location and details of changes. Based on this, we have designed a cross-task network called SDCINet, which integrates OD and CD, and have created four dual-task datasets focused on disasters and urbanization. SDCINet is a novel deep learning dual-task framework composed of a consistency encoder, differentiation decoder, and cross-task global attention collaboration module (CGAC). It is capable of modeling differential feature relationships based on bi-temporal images, performing end-to-end pixel-level prediction, and object bounding box regression. The bi-direction traction function of CGAC is used to deeply couple OD and CD tasks. Additionally, we collected bi-temporal images from 10 locations worldwide that experienced earthquakes, explosions, wars, and conflicts to construct two datasets specifically for damaged building OD and CD. We also constructed two datasets for changed building OD and CD based on two publicly available CD datasets. These four datasets can serve as data benchmarks for dual-task research on DCB. Using these datasets, we conducted extensive performance evaluations of 18 state-of-the-art models from the perspectives of OD, CD, and instance segmentation. Benchmark experimental results demonstrated the superior performance of SDCINet. Ablation experiments and evaluative analyses confirmed the effectiveness and unique value of CGAC.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 422-446"},"PeriodicalIF":10.6,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hua Su , Feiyan Zhang , Jianchen Teng , An Wang , Zhanchao Huang
{"title":"Reconstructing high-resolution subsurface temperature of the global ocean using deep forest with combined remote sensing and in situ observations","authors":"Hua Su , Feiyan Zhang , Jianchen Teng , An Wang , Zhanchao Huang","doi":"10.1016/j.isprsjprs.2024.09.022","DOIUrl":"10.1016/j.isprsjprs.2024.09.022","url":null,"abstract":"<div><div>Estimating high-resolution ocean subsurface temperature has great importance for the refined study of ocean climate variability and change. However, the insufficient resolution and accuracy of subsurface temperature data greatly limits our comprehensive understanding of mesoscale and other fine-scale ocean processes. In this study, we integrated multiple remote sensing data and <em>in situ</em> observations to compare four models within two frameworks (gradient boosting and deep learning). The optimal model, Deep Forest, was selected to generate a high-resolution subsurface temperature dataset (DORS0.25°) for the upper 2000 m from 1993 to 2023. DORS0.25° exhibits excellent reconstruction accuracy, with an average <em>R</em><sup>2</sup> of 0.980 and RMSE of 0.579 °C, and the monthly average accuracy is higher than IAP and ORAS5 datasets. Particularly, DORS0.25° can effectively capture detailed ocean warming characteristics in complex dynamic regions such as the Gulf Stream and the Kuroshio Extension, facilitating the study of mesoscale processes and warming within the global-scale ocean. Moreover, the research highlights that the rate of warming over the past decade has been significant, and ocean warming has consistently reached new highs since 2019. This study has demonstrated that DORS0.25° is a crucial dataset for understanding and monitoring the spatiotemporal characteristics and processes of global ocean warming, providing valuable data support for the sustainable development of the marine environment and climate change actions.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 389-404"},"PeriodicalIF":10.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A unified feature-motion consistency framework for robust image matching","authors":"Yan Zhou, Jinding Gao, Xiaoping Liu","doi":"10.1016/j.isprsjprs.2024.09.021","DOIUrl":"10.1016/j.isprsjprs.2024.09.021","url":null,"abstract":"<div><div>Establishing reliable feature matches between a pair of images in various scenarios is a long-standing open problem in photogrammetry. Attention-based detector-free matching with coarse-to-fine architecture has been a typical pipeline to build matches, but the cross-attention module with global receptive field may compromise the structural local consistency by introducing irrelevant regions (outliers). Motion field can maintain structural local consistency under the assumption that matches for adjacent features should be spatially proximate. However, motion field can only estimate local displacements between consecutive images and struggle with long-range displacements estimation in large-scale variation scenarios without spatial correlation priors. Moreover, large-scale variations may also disrupt the geometric consistency with the application of mutual nearest neighbor criterion in patch-level matching, making it difficult to recover accurate matches. In this paper, we propose a unified feature-motion consistency framework for robust image matching (MOMA), to maintain structural consistency at both global and local granularity in scale-discrepancy scenarios. MOMA devises a motion consistency-guided dependency range strategy (MDR) in cross attention, aggregating highly relevant regions within the motion consensus-restricted neighborhood to favor true matchable regions. Meanwhile, a unified framework with hierarchical attention structure is established to couple local motion field with global feature correspondence. The motion field provides local consistency constraints in feature aggregation, while feature correspondence provides spatial context prior to improve motion field estimation. To alleviate geometric inconsistency caused by hard nearest neighbor criterion, we propose an adaptive neighbor search (soft) strategy to address scale discrepancy. Extensive experiments on three datasets demonstrate that our method outperforms solid baselines, with AUC improvements of 4.73/4.02/3.34 in two-view pose estimation task at thresholds of 5°/10°/20° on Megadepth test, and 5.94% increase of accuracy at threshold of 1px in homography task on HPatches datasets. Furthermore, in the downstream tasks such as 3D mapping, the 3D models reconstructed using our method on the self-collected SYSU UAV datasets exhibit significant improvement in structural completeness and detail richness, manifesting its high applicability in wide downstream tasks. The code is publicly available at <span><span>https://github.com/BunnyanChou/MOMA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 368-388"},"PeriodicalIF":10.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shikang Tao, Mengyuan Yang, Min Wang, Rui Yang, Qian Shen
{"title":"Small object change detection in UAV imagery via a Siamese network enhanced with temporal mutual attention and contextual features: A case study concerning solar water heaters","authors":"Shikang Tao, Mengyuan Yang, Min Wang, Rui Yang, Qian Shen","doi":"10.1016/j.isprsjprs.2024.09.027","DOIUrl":"10.1016/j.isprsjprs.2024.09.027","url":null,"abstract":"<div><div>Small object change detection (SOCD) based on high-spatial resolution (HSR) images is of significant practical value in applications such as the investigation of illegal urban construction, but little research is currently available. This study proposes an SOCD model called TMACNet based on a multitask network architecture. The model modifies the YOLOv8 network into a Siamese network and adds structures, including a feature difference branch (FDB), temporal mutual attention layer (TMAL) and contextual attention module (CAM), to merge differential and contextual features from different phases for the accurate extraction and analysis of small objects and their changes. To verify the proposed method, an SOCD dataset called YZDS is created based on unmanned aerial vehicle (UAV) images of small-scale solar water heaters on rooftops. The experimental results show that TMACNet exhibits strong resistance to image registration errors and building height displacement and prevents error propagation from object detection to change detection originating from overlay-based change detection. TMACNet also provides an enhanced approach to small object detection from the perspective of multitemporal information fusion. In the change detection task, TMACNet exhibits notable F1 improvements exceeding 5.96% in comparison with alternative change detection methods. In the object detection task, TMACNet outperforms the single-temporal object detection models, increasing accuracy with an approximately 1–3% improvement in the AP metric while simplifying the technical process.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 352-367"},"PeriodicalIF":10.6,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142318925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Pan , Xiangming Xiao , Haoming Xia , Xiaoyan Ma , Yanhua Xie , Baihong Pan , Yuanwei Qin
{"title":"Time series sUAV data reveal moderate accuracy and large uncertainties in spring phenology metric of deciduous broadleaf forest as estimated by vegetation index-based phenological models","authors":"Li Pan , Xiangming Xiao , Haoming Xia , Xiaoyan Ma , Yanhua Xie , Baihong Pan , Yuanwei Qin","doi":"10.1016/j.isprsjprs.2024.09.023","DOIUrl":"10.1016/j.isprsjprs.2024.09.023","url":null,"abstract":"<div><div>Accurate delineation of spring phenology (e.g., start of growing season, SOS) of deciduous forests is essential for understanding its responses to environmental changes. To date, SOS dates from analyses of satellite images and vegetation index (VI) −based phenological models have notable discrepancies but they have not been fully evaluated, primarily due to the lack of ground reference data for evaluation. This study evaluated the SOS dates of a deciduous broadleaf forest estimated by VI-based phenological models from three satellite sensors (PlanetScope, Sentinel-2A/B, and Landsat-7/8/9) by using ground reference data collected by a small unmanned aerial vehicle (sUAV). Daily sUAV imagery (0.035-meter resolution) was used to identify and generate green leaf maps. These green leaf maps were further aggregated to generate Green Leaf Fraction (GLF) maps at the spatial resolutions of PlanetScope (3-meter), Sentinel-2A/B (10-meter), and Landsat-7/8/9 (30-meter). The temporal changes of GLF differ from those of vegetation indices in spring, with the peak dates of GLF being much earlier than those of VIs. At the SOS dates estimated by VI-based phenological models in 2022 (Julian days from 105 to 111), GLF already ranges from 62% to 96%. The moderate accuracy and large uncertainties of SOS dates from VI-based phenological models arise from the limitations of vegetation indices in accurately tracking the number of green leaves and the inherent uncertainties of the mathematical models used. The results of this study clearly highlight the need for new research on spring phenology of deciduous forests.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 339-351"},"PeriodicalIF":10.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142314145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weijie Li , Wei Yang , Tianpeng Liu , Yuenan Hou , Yuxuan Li , Zhen Liu , Yongxiang Liu , Li Liu
{"title":"Predicting gradient is better: Exploring self-supervised learning for SAR ATR with a joint-embedding predictive architecture","authors":"Weijie Li , Wei Yang , Tianpeng Liu , Yuenan Hou , Yuxuan Li , Zhen Liu , Yongxiang Liu , Li Liu","doi":"10.1016/j.isprsjprs.2024.09.013","DOIUrl":"10.1016/j.isprsjprs.2024.09.013","url":null,"abstract":"<div><div>The growing Synthetic Aperture Radar (SAR) data can build a foundation model using self-supervised learning (SSL) methods, which can achieve various SAR automatic target recognition (ATR) tasks with pretraining in large-scale unlabeled data and fine-tuning in small-labeled samples. SSL aims to construct supervision signals directly from the data, minimizing the need for expensive expert annotation and maximizing the use of the expanding data pool for a foundational model. This study investigates an effective SSL method for SAR ATR, which can pave the way for a foundation model in SAR ATR. The primary obstacles faced in SSL for SAR ATR are small targets in remote sensing and speckle noise in SAR images, corresponding to the SSL approach and signals. To overcome these challenges, we present a novel joint-embedding predictive architecture for SAR ATR (SAR-JEPA) thatleverages local masked patches to predict the multi-scale SAR gradient representations of an unseen context. The key aspect of SAR-JEPA is integrating SAR domain features to ensure high-quality self-supervised signals as target features. In addition, we employ local masks and multi-scale features to accommodate various small targets in remote sensing. By fine-tuning and evaluating our framework on three target recognition datasets (vehicle, ship, and aircraft) with four other datasets as pretraining, we demonstrate its outperformance over other SSL methods and its effectiveness as the SAR data increases. This study demonstrates the potential of SSL for the recognition of SAR targets across diverse targets, scenes, and sensors. Our codes and weights are available in <span><span>https://github.com/waterdisappear/SAR-JEPA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 326-338"},"PeriodicalIF":10.6,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}