Andrea Pulella , Francescopaolo Sica , Carlos Villamil Lopez , Harald Anglberger , Ronny Hänsch
{"title":"Generalization in deep learning-based aircraft classification for SAR imagery","authors":"Andrea Pulella , Francescopaolo Sica , Carlos Villamil Lopez , Harald Anglberger , Ronny Hänsch","doi":"10.1016/j.isprsjprs.2024.10.030","DOIUrl":"10.1016/j.isprsjprs.2024.10.030","url":null,"abstract":"<div><div>Automatic Target Recognition (ATR) from Synthetic Aperture Radar (SAR) data covers a wide range of applications. SAR ATR helps to detect and track vehicles and other objects, e.g. in disaster relief and surveillance operations. Aircraft classification covers a significant part of this research area, which differs from other SAR-based ATR tasks, such as ship and ground vehicle detection and classification, in that aircrafts are usually a static target, often remaining at the same location and in a given orientation for longer time frames. Today, there is a significant mismatch between the abundance of deep learning-based aircraft classification models and the availability of corresponding datasets. This mismatch has led to models with improved classification performance on specific datasets, but the challenge of generalizing to conditions not present in the training data (which are expected to occur in operational conditions) has not yet been satisfactorily analyzed. This paper aims to evaluate how classification performance and generalization capabilities of deep learning models are influenced by the diversity of the training dataset. Our goal is to understand the model’s competence and the conditions under which it can achieve proficiency in aircraft classification tasks for high-resolution SAR images while demonstrating generalization capabilities when confronted with novel data that include different geographic locations, environmental conditions, and geometric variations. We address this gap by using manually annotated high-resolution SAR data from TerraSAR-X and TanDEM-X and show how the classification performance changes for different application scenarios requiring different training and evaluation setups. We find that, as expected, the type of aircraft plays a crucial role in the classification problem, since it will vary in shape and dimension. However, these aspects are secondary to how the SAR image is acquired, with the acquisition geometry playing the primary role. Therefore, we find that the characteristics of the acquisition are much more relevant for generalization than the complex geometry of the target. We show this for various models selected among the standard classification algorithms.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 312-323"},"PeriodicalIF":10.6,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuanpeng Zhao , Yubin Li , Mingming Jia , Chengbin Wu , Rong Zhang , Chunying Ren , Zongming Wang
{"title":"Advancing mangrove species mapping: An innovative approach using Google Earth images and a U-shaped network for individual-level Sonneratia apetala detection","authors":"Chuanpeng Zhao , Yubin Li , Mingming Jia , Chengbin Wu , Rong Zhang , Chunying Ren , Zongming Wang","doi":"10.1016/j.isprsjprs.2024.10.016","DOIUrl":"10.1016/j.isprsjprs.2024.10.016","url":null,"abstract":"<div><div>The exotic mangrove species <em>Sonneratia apetala</em> has been colonizing coastal China for several decades, sparking attention and debates from the public and policy-makers about its reproduction, dispersal, and spread. Existing local-scale studies have relied on fine but expensive data sources to map mangrove species, limiting their applicability for detecting <em>S. apetala</em> in large areas due to cost constraints. A previous study utilized freely available Sentinel-2 images to construct a 10-m-resolution <em>S. apetala</em> map in China but did not capture small clusters of <em>S. apetala</em> due to resolution limitations. To precisely detect <em>S. apetala</em> in coastal China, we proposed an approach that integrates freely accessible submeter-resolution Google Earth images to control expenses, a 10-m-resolution <em>S. apetala</em> map to retrieve well-distributed samples, and several U-shaped networks to capture <em>S. apetala</em> in the form of clusters and individuals. Comparisons revealed that the lite U-squared network was most suitable for detecting <em>S. apetala</em> among the five U-shaped networks. The resulting map achieved an overall accuracy of 98.2 % using testing samples and an accuracy of 91.0 % using field sample plots. Statistics indicated that the total area covered by <em>S. apetala</em> in China was 4000.4 ha in 2022, which was 33.4 % greater than that of the 10-m-resolution map. The excessive area suggested the presence of a large number of small clusters beyond the discrimination capacity of medium-resolution images. Furthermore, the mechanism of the approach was interpreted using an example-based method that altered image color, shape, orientation, and textures. Comparisons showed that textures were the key feature for identifying <em>S. apetala</em> based on submeter-resolution Google Earth images. The detection accuracy rapidly decreased with the blurring of textures, and images at zoom levels of 20, 19, and 18 were applicable to the trained network. Utilizing the first individual-level map, we estimated the number of mature <em>S. apetala</em> trees to be approximately 2.35 million with a 95 % confidence interval between 2.30 and 2.40 million, providing a basis for managing this exotic mangrove species. This study deepens existing research on <em>S. apetala</em> by providing an approach with a clear mechanism, an individual-level distribution with a much larger area, and an estimation of the number of mature trees. This study advances mangrove species mapping by combining the advantages of freely accessible medium- and high-resolution images: the former provides abundant spectral information to integrate discrete local-scale maps to generate a large-scale map, while the latter offers textural information from submeter-resolution Google Earth images to detect mangrove species in detail.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 276-293"},"PeriodicalIF":10.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Pan , Jiangong Xu , Xiaoyu Yu , Guo Ye , Mi Wang , Yumin Chen , Jianshen Ma
{"title":"HDRSA-Net: Hybrid dynamic residual self-attention network for SAR-assisted optical image cloud and shadow removal","authors":"Jun Pan , Jiangong Xu , Xiaoyu Yu , Guo Ye , Mi Wang , Yumin Chen , Jianshen Ma","doi":"10.1016/j.isprsjprs.2024.10.026","DOIUrl":"10.1016/j.isprsjprs.2024.10.026","url":null,"abstract":"<div><div>Clouds and shadows often contaminate optical remote sensing images, resulting in missing information. Consequently, continuous spatiotemporal monitoring of the Earth’s surface requires the efficient removal of clouds and shadows. Unlike optical satellites, synthetic aperture radar (SAR) has active imaging capabilities in all weather conditions, supplying valuable supplementary information for reconstructing missing regions. Nevertheless, the reconstruction of high-fidelity cloud-free images based on SAR-optical data fusion remains challenging due to differences in imaging mechanisms and the considerable contamination from speckle noise inherent in SAR imagery. To solve the aforementioned challenges, this paper presents a novel hybrid dynamic residual self-attention network (HDRSA-Net), aiming to fully exploit the potential of SAR images in reconstructing missing regions. The proposed HDRSA-Net comprises multiple dynamic interaction residual (DIR) groups organized into an end-to-end trainable deep hierarchical stacked architecture. Specifically, the omni-dimensional dynamic local exploration (ODDLE) module and the sparse global context aggregation (SGCA) module are used to form a local–global feature adaptive extraction and implicit enhancement. A multi-task cooperative optimization loss function is designed to ensure that the results exhibit high spectral fidelity and coherent spatial structures. Additionally, this paper releases a large dataset that can comprehensively evaluate the reconstruction quality under different cloud coverages and various types of ground cover, providing a solid foundation for restoring satisfactory sensory effects and reliable semantic application value. In comparison to the current representative algorithms, the presented approach exhibits effectiveness and advancement in reconstructing missing regions with stability. The project is accessible at: <span><span>https://github.com/RSIIPAC/LuojiaSET-OSFCR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 258-275"},"PeriodicalIF":10.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Wang , Yizhi Zhang , Quanhua Dong , Hao Guo , Yingchun Tao , Fan Zhang
{"title":"A multi-view graph neural network for building age prediction","authors":"Yi Wang , Yizhi Zhang , Quanhua Dong , Hao Guo , Yingchun Tao , Fan Zhang","doi":"10.1016/j.isprsjprs.2024.10.011","DOIUrl":"10.1016/j.isprsjprs.2024.10.011","url":null,"abstract":"<div><div>Building age is crucial for inferring building energy consumption and understanding the interactions between human behavior and urban infrastructure. Limited by the challenges of surveys, some machine learning methods have been utilized to predict and fill in missing building age data using building footprint. However, the existing methods lack explicit modeling of spatial effects and semantic relationships between buildings. To alleviate these challenges, we propose a novel multi-view graph neural network called Building Age Prediction Network (BAPN). The features of spatial autocorrelation, spatial heterogeneity and semantic similarity were extracted and integrated using multiple graph convolutional networks. Inspired by the spatial regime model, a heterogeneity-aware graph convolutional network (HGCN) based on spatial grouping is designed to capture the spatial heterogeneity. Systematic experiments on three large-scale building footprint datasets demonstrate that BAPN outperforms existing machine learning and graph learning models, achieving high accuracy ranging from 61% to 80%. Moreover, missing building age data within the Fifth Ring Road of Beijing was filled, validating the feasibility of BAPN. This research offers new insights for filling the intra-city building age gaps and understanding multiple spatial effects essential for sustainable urban planning.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 294-311"},"PeriodicalIF":10.6,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142658332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dansheng Yao , Mengqi Zhu , Hehua Zhu , Wuqiang Cai , Long Zhou
{"title":"Integrating synthetic datasets with CLIP semantic insights for single image localization advancements","authors":"Dansheng Yao , Mengqi Zhu , Hehua Zhu , Wuqiang Cai , Long Zhou","doi":"10.1016/j.isprsjprs.2024.10.027","DOIUrl":"10.1016/j.isprsjprs.2024.10.027","url":null,"abstract":"<div><div>Accurate localization of pedestrians and mobile robots is critical for navigation, emergency response, and autonomous driving. Traditional localization methods, such as satellite signals, often prove ineffective in certain environments, and acquiring sufficient positional data can be challenging. Single image localization techniques have been developed to address these issues. However, current deep learning frameworks for single image localization that rely on domain adaptation fail to effectively utilize semantically rich high-level features obtained from large-scale pretraining. This paper introduces a novel framework that leverages the Contrastive Language-Image Pre-training model and prompts to enhance feature extraction and domain adaptation through semantic information. The proposed framework generates an integrated score map from scene-specific prompts to guide feature extraction and employs adversarial components to facilitate domain adaptation. Furthermore, a reslink component is incorporated to mitigate the precision loss in high-level features compared to the original data. Experimental results demonstrate that the use of prompts reduces localization errors by 26.4 % in indoor environments and 24.3 % in outdoor settings. The model achieves localization errors as low as 0.75 m and 8.09 degrees indoors, and 4.56 m and 7.68 degrees outdoors. Analysis of prompts from labeled datasets confirms the model’s capability to effectively interpret scene information. The weights of the integrated score map enhance the model’s transparency, thereby improving interpretability. This study underscores the efficacy of integrating semantic information into image localization tasks.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 198-213"},"PeriodicalIF":10.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective weighted least square and piecewise bilinear transformation for accurate satellite DSM generation","authors":"Nazila Mohammadi, Amin Sedaghat","doi":"10.1016/j.isprsjprs.2024.11.001","DOIUrl":"10.1016/j.isprsjprs.2024.11.001","url":null,"abstract":"<div><div>One of the main products of multi-view stereo (MVS) high-resolution satellite (HRS) images in photogrammetry and remote sensing is digital surface model (DSM). Producing DSMs from MVS HRS images still faces serious challenges due to various reasons such as complexity of imaging geometry and exterior orientation model in HRS, as well as large dimensions and various geometric and illumination variations. The main motivation for conducting this research is to provide a novel and efficient method that enhances the accuracy and completeness of extracting DSM from HRS images compared to existing recent methods. The proposed method called Sat-DSM, consists of five main stages. Initially, a very dense set of tie-points is extracted from the images using a tile-based matching method, phase congruency-based feature detectors and descriptors, and a local geometric consistency correspondence method. Then, the process of Rational Polynomial Coefficients (RPC) block adjustment is performed to compensate the RPC bias errors. After that, a dense matching process is performed to generate 3D point clouds for each pair of input HRS images using a new geometric transformation called PWB (pricewise bilinear) and an accurate area-based matching method called SWLSM (selective weighted least square matching). The key innovations of this research include the introduction of SWLSM and PWB methods for an accurate dense matching process. The PWB is a novel and simple piecewise geometric transformation model based on superpixel over-segmentation that has been proposed for accurate registration of each pair of HRS images. The SWLSM matching method is based on phase congruency measure and a selection strategy to improve the well-known LSM (least square matching) performance. After dense matching process, the final stage is spatial intersection to generate 3D point clouds, followed by elevation interpolation to produce DSM. To evaluate the Sat-DSM method, 12 sets of MVS-HRS data from IRS-P5, ZY3-1, ZY3-2, and Worldview-3 sensors were selected from areas with different landscapes such as urban, mountainous, and agricultural areas. The results indicate the superiority of the proposed Sat-DSM method over four other methods CATALYST, SGM (Semi-global matching), SS-DSM (structural similarity based DSM extraction), and Sat-MVSF in terms of completeness, RMSE, and MEE. The demo code is available at <span><span>https://www.researchgate.net/publication/377721674_SatDSM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 214-230"},"PeriodicalIF":10.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaxin Ren , Wanzeng Liu , Jun Chen , Shunxi Yin , Yuan Tao
{"title":"Word2Scene: Efficient remote sensing image scene generation with only one word via hybrid intelligence and low-rank representation","authors":"Jiaxin Ren , Wanzeng Liu , Jun Chen , Shunxi Yin , Yuan Tao","doi":"10.1016/j.isprsjprs.2024.11.002","DOIUrl":"10.1016/j.isprsjprs.2024.11.002","url":null,"abstract":"<div><div>To address the numerous challenges existing in current remote sensing scene generation methods, such as the difficulty in capturing complex interrelations among geographical features and the integration of implicit expert knowledge into generative models, this paper proposes an efficient method for generating remote sensing scenes using hybrid intelligence and low-rank representation, named Word2Scene, which can generate complex scenes with just one word. This approach combines geographic expert knowledge to optimize the remote sensing scene description, enhancing the accuracy and interpretability of the input descriptions. By employing a diffusion model based on hybrid intelligence and low-rank representation techniques, this method endows the diffusion model with the capability to understand remote sensing scene concepts and significantly improves the training efficiency of the diffusion model. This study also introduces the geographic scene holistic perceptual similarity (GSHPS), a novel evaluation metric that holistically assesses the performance of generative models from a global perspective. Experimental results demonstrate that our proposed method outperforms existing state-of-the-art models in terms of remote sensing scene generation quality, efficiency, and realism. Compared to the original diffusion models, LPIPS decreased by 18.52% (from 0.81 to 0.66), and GSHPS increased by 28.57% (from 0.70 to 0.90), validating the effectiveness and advancement of our method. Moreover, Word2Scene is capable of generating remote sensing scenes not present in the training set, showcasing strong zero-shot capabilities. This provides a new perspective and solution for remote sensing image scene generation, with the potential to advance the development of remote sensing, geographic information systems, and related fields. Our code will be released at <span><span>https://github.com/jaycecd/Word2Scene</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 231-257"},"PeriodicalIF":10.6,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A_OPTRAM-ET: An automatic optical trapezoid model for evapotranspiration estimation and its global-scale assessments","authors":"Zhaoyuan Yao, Wangyipu Li, Yaokui Cui","doi":"10.1016/j.isprsjprs.2024.10.019","DOIUrl":"10.1016/j.isprsjprs.2024.10.019","url":null,"abstract":"<div><div>Remotely sensed evapotranspiration (ET) at a high spatial resolution (30 m) has wide-ranging applications in agriculture, hydrology and meteorology. The original optical trapezoid model for ET (O_OPTRAM-ET), which does not require thermal remote sensing, shows potential for high-resolution ET estimation. However, the non-automated O_OPTRAM-ET heavily depends on visual interpretation or optimization with in situ measurements, limiting its practical utility. In this study, a SpatioTemporal Aggregated Regression algorithm (STAR) is proposed to develop an automatic trapezoid model for ET (A_OPTRAM-ET), implemented within the Google Earth Engine environment, and evaluated globally at both moderate and high resolutions (500 m and 30 m, respectively). Through the integration of an aggregation algorithm across multiple dimensions to automatically determine its parameters, A_OPTRAM-ET can operate efficiently without the need for ground-based measurements as input. Evaluation against in situ ET demonstrates that the proposed A_OPTRAM-ET model effectively estimates ET across various land cover types and satellite platforms. The overall root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (CC) when compared with in situ latent heat flux (LE) measurements are 35.5 W·m<sup>−2</sup> (41.3 W·m<sup>−2</sup>, 40.0 W·m<sup>−2</sup>, 36.1 W·m<sup>−2</sup>,), 26.3 W·m<sup>−2</sup> (28.9 W·m<sup>−2</sup>, 28.7 W·m<sup>−2</sup>, 25.8 W·m<sup>−2</sup>,), and 0.78 (0.73, 0.70, 0.72) for Sentinel-2 (Landsat-8, Landsat-5, MOD09GA), respectively. The A_OPTRAM-ET model exhibits a stable accuracy over long time periods (approximately 10 years). When compared with other published ET datasets, ET estimated by the A_OPTRAM-ET model is better with the land cover types of cropland and shrubland. Additionally, global ET derived from the A_OPTRAM-ET model shows trends consistent with other published ET datasets over the period 2001–2020, while offering enhanced spatial details. Therefore, the proposed A_OPTRAM-ET model provides an efficient, high-resolution, and globally applicable method for ET estimation, with significant practical values for agriculture, hydrology, and related fields.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 181-197"},"PeriodicalIF":10.6,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Li , Xianqiang He , Palanisamy Shanmugam , Yan Bai , Xuchen Jin , Zhihong Wang , Yifan Zhang , Difeng wang , Fang Gong , Min Zhao
{"title":"Atmospheric correction of geostationary ocean color imager data over turbid coastal waters under high solar zenith angles","authors":"Hao Li , Xianqiang He , Palanisamy Shanmugam , Yan Bai , Xuchen Jin , Zhihong Wang , Yifan Zhang , Difeng wang , Fang Gong , Min Zhao","doi":"10.1016/j.isprsjprs.2024.10.018","DOIUrl":"10.1016/j.isprsjprs.2024.10.018","url":null,"abstract":"<div><div>The traditional atmospheric correction models employed with the near-infrared iterative schemes inaccurately estimate aerosol radiance at high solar zenith angles (SZAs), leading to a substantial loss of valid products for dawn or dusk observations by the geostationary satellite ocean color sensor. To overcome this issue, we previously developed an atmospheric correction model suitable for open ocean waters observed by the first geostationary satellite ocean color imager (GOCI) under high SZAs. This model was constructed based on a dataset from stable open ocean waters, which makes it less suitable for coastal waters. In this study, we developed a specialized atmospheric correction model (GOCI-II-NN) capable of accurately retrieving the water-leaving radiance from GOCI-II observations in coastal oceans under high SZAs. We utilized multiple observations from GOCI-II throughout the day to develop the selection criteria for extracting the stable coastal water pixels and created a new training dataset for the proposed model. The performance of the GOCI-II-NN model was validated by in-situ data collected from coastal/shelf waters. The results showed an Average Percentage Difference (APD) of less than 23% across the entire visible spectrum. In terms of the valid data and retrieval accuracy, the GOCI-II-NN model was superior to the traditional near-infrared and ultraviolet atmospheric correction models in terms of accurately retrieving the ocean color products for various applications, such as tracking/monitoring of algal blooms, sediment dynamics, and water quality among other applications.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 166-180"},"PeriodicalIF":10.6,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhibo Rao , Xing Li , Bangshu Xiong , Yuchao Dai , Zhelun Shen , Hangbiao Li , Yue Lou
{"title":"Cascaded recurrent networks with masked representation learning for stereo matching of high-resolution satellite images","authors":"Zhibo Rao , Xing Li , Bangshu Xiong , Yuchao Dai , Zhelun Shen , Hangbiao Li , Yue Lou","doi":"10.1016/j.isprsjprs.2024.10.017","DOIUrl":"10.1016/j.isprsjprs.2024.10.017","url":null,"abstract":"<div><div>Stereo matching of satellite images presents challenges due to missing data, domain differences, and imperfect rectification. To address these issues, we propose cascaded recurrent networks with masked representation learning for high-resolution satellite stereo images, consisting of feature extraction and cascaded recurrent modules. First, we develop the correlation computation in the cascaded recurrent module to search for results on the epipolar line and adjacent areas, mitigating the impacts of erroneous rectification. Second, we use a training strategy based on masked representation learning to handle missing data and different domain attributes, enhancing data utilization and feature representation. Our training strategy includes two stages: (1) image reconstruction stage. We feed masked left or right images to the feature extraction module and adopt a reconstruction decoder to reconstruct the original images as a pre-training process, obtaining a pre-trained feature extraction module; (2) the stereo matching stage. We lock the parameters of the feature extraction module and employ stereo image pairs to train the cascaded recurrent module to get the final model. We implement the cascaded recurrent networks with two well-known feature extraction modules (CNN-based Restormer or Transformer-based ViT) to prove the effectiveness of our approach. Experimental results on the US3D and WHU-Stereo datasets show that: (1) Our training strategy can be used for CNN-based and Transformer-based methods on the remote sensing datasets with limited data to improve performance, outperforming the second-best network HMSM-Net by approximately 0.54% and 1.95% in terms of the percentage of the 3-px error on the WHU-Stereo and US3D datasets, respectively; (2) Our correlation manner can handle imperfect rectification, reducing the error rate by 8.9% on the random shift test; (3) Our method can predict high-quality disparity maps and achieve state-of-the-art performance, reducing the percentage of the 3-px error to 12.87% and 7.01% on the WHU-Stereo and US3D datasets, respectively. The source codes are released at <span><span>https://github.com/Archaic-Atom/MaskCRNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 151-165"},"PeriodicalIF":10.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}