Zhibo Rao , Xing Li , Bangshu Xiong , Yuchao Dai , Zhelun Shen , Hangbiao Li , Yue Lou
{"title":"Cascaded recurrent networks with masked representation learning for stereo matching of high-resolution satellite images","authors":"Zhibo Rao , Xing Li , Bangshu Xiong , Yuchao Dai , Zhelun Shen , Hangbiao Li , Yue Lou","doi":"10.1016/j.isprsjprs.2024.10.017","DOIUrl":null,"url":null,"abstract":"<div><div>Stereo matching of satellite images presents challenges due to missing data, domain differences, and imperfect rectification. To address these issues, we propose cascaded recurrent networks with masked representation learning for high-resolution satellite stereo images, consisting of feature extraction and cascaded recurrent modules. First, we develop the correlation computation in the cascaded recurrent module to search for results on the epipolar line and adjacent areas, mitigating the impacts of erroneous rectification. Second, we use a training strategy based on masked representation learning to handle missing data and different domain attributes, enhancing data utilization and feature representation. Our training strategy includes two stages: (1) image reconstruction stage. We feed masked left or right images to the feature extraction module and adopt a reconstruction decoder to reconstruct the original images as a pre-training process, obtaining a pre-trained feature extraction module; (2) the stereo matching stage. We lock the parameters of the feature extraction module and employ stereo image pairs to train the cascaded recurrent module to get the final model. We implement the cascaded recurrent networks with two well-known feature extraction modules (CNN-based Restormer or Transformer-based ViT) to prove the effectiveness of our approach. Experimental results on the US3D and WHU-Stereo datasets show that: (1) Our training strategy can be used for CNN-based and Transformer-based methods on the remote sensing datasets with limited data to improve performance, outperforming the second-best network HMSM-Net by approximately 0.54% and 1.95% in terms of the percentage of the 3-px error on the WHU-Stereo and US3D datasets, respectively; (2) Our correlation manner can handle imperfect rectification, reducing the error rate by 8.9% on the random shift test; (3) Our method can predict high-quality disparity maps and achieve state-of-the-art performance, reducing the percentage of the 3-px error to 12.87% and 7.01% on the WHU-Stereo and US3D datasets, respectively. The source codes are released at <span><span>https://github.com/Archaic-Atom/MaskCRNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 151-165"},"PeriodicalIF":10.6000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271624003940","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Stereo matching of satellite images presents challenges due to missing data, domain differences, and imperfect rectification. To address these issues, we propose cascaded recurrent networks with masked representation learning for high-resolution satellite stereo images, consisting of feature extraction and cascaded recurrent modules. First, we develop the correlation computation in the cascaded recurrent module to search for results on the epipolar line and adjacent areas, mitigating the impacts of erroneous rectification. Second, we use a training strategy based on masked representation learning to handle missing data and different domain attributes, enhancing data utilization and feature representation. Our training strategy includes two stages: (1) image reconstruction stage. We feed masked left or right images to the feature extraction module and adopt a reconstruction decoder to reconstruct the original images as a pre-training process, obtaining a pre-trained feature extraction module; (2) the stereo matching stage. We lock the parameters of the feature extraction module and employ stereo image pairs to train the cascaded recurrent module to get the final model. We implement the cascaded recurrent networks with two well-known feature extraction modules (CNN-based Restormer or Transformer-based ViT) to prove the effectiveness of our approach. Experimental results on the US3D and WHU-Stereo datasets show that: (1) Our training strategy can be used for CNN-based and Transformer-based methods on the remote sensing datasets with limited data to improve performance, outperforming the second-best network HMSM-Net by approximately 0.54% and 1.95% in terms of the percentage of the 3-px error on the WHU-Stereo and US3D datasets, respectively; (2) Our correlation manner can handle imperfect rectification, reducing the error rate by 8.9% on the random shift test; (3) Our method can predict high-quality disparity maps and achieve state-of-the-art performance, reducing the percentage of the 3-px error to 12.87% and 7.01% on the WHU-Stereo and US3D datasets, respectively. The source codes are released at https://github.com/Archaic-Atom/MaskCRNet.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.