用于高分辨率卫星图像立体匹配的具有遮蔽表示学习功能的级联递归网络

IF 10.6 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2024-10-30 DOI:10.1016/j.isprsjprs.2024.10.017

Zhibo Rao , Xing Li , Bangshu Xiong , Yuchao Dai , Zhelun Shen , Hangbiao Li , Yue Lou

{"title":"用于高分辨率卫星图像立体匹配的具有遮蔽表示学习功能的级联递归网络","authors":"Zhibo Rao , Xing Li , Bangshu Xiong , Yuchao Dai , Zhelun Shen , Hangbiao Li , Yue Lou","doi":"10.1016/j.isprsjprs.2024.10.017","DOIUrl":null,"url":null,"abstract":"<div><div>Stereo matching of satellite images presents challenges due to missing data, domain differences, and imperfect rectification. To address these issues, we propose cascaded recurrent networks with masked representation learning for high-resolution satellite stereo images, consisting of feature extraction and cascaded recurrent modules. First, we develop the correlation computation in the cascaded recurrent module to search for results on the epipolar line and adjacent areas, mitigating the impacts of erroneous rectification. Second, we use a training strategy based on masked representation learning to handle missing data and different domain attributes, enhancing data utilization and feature representation. Our training strategy includes two stages: (1) image reconstruction stage. We feed masked left or right images to the feature extraction module and adopt a reconstruction decoder to reconstruct the original images as a pre-training process, obtaining a pre-trained feature extraction module; (2) the stereo matching stage. We lock the parameters of the feature extraction module and employ stereo image pairs to train the cascaded recurrent module to get the final model. We implement the cascaded recurrent networks with two well-known feature extraction modules (CNN-based Restormer or Transformer-based ViT) to prove the effectiveness of our approach. Experimental results on the US3D and WHU-Stereo datasets show that: (1) Our training strategy can be used for CNN-based and Transformer-based methods on the remote sensing datasets with limited data to improve performance, outperforming the second-best network HMSM-Net by approximately 0.54% and 1.95% in terms of the percentage of the 3-px error on the WHU-Stereo and US3D datasets, respectively; (2) Our correlation manner can handle imperfect rectification, reducing the error rate by 8.9% on the random shift test; (3) Our method can predict high-quality disparity maps and achieve state-of-the-art performance, reducing the percentage of the 3-px error to 12.87% and 7.01% on the WHU-Stereo and US3D datasets, respectively. The source codes are released at <span><span>https://github.com/Archaic-Atom/MaskCRNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 151-165"},"PeriodicalIF":10.6000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cascaded recurrent networks with masked representation learning for stereo matching of high-resolution satellite images\",\"authors\":\"Zhibo Rao , Xing Li , Bangshu Xiong , Yuchao Dai , Zhelun Shen , Hangbiao Li , Yue Lou\",\"doi\":\"10.1016/j.isprsjprs.2024.10.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Stereo matching of satellite images presents challenges due to missing data, domain differences, and imperfect rectification. To address these issues, we propose cascaded recurrent networks with masked representation learning for high-resolution satellite stereo images, consisting of feature extraction and cascaded recurrent modules. First, we develop the correlation computation in the cascaded recurrent module to search for results on the epipolar line and adjacent areas, mitigating the impacts of erroneous rectification. Second, we use a training strategy based on masked representation learning to handle missing data and different domain attributes, enhancing data utilization and feature representation. Our training strategy includes two stages: (1) image reconstruction stage. We feed masked left or right images to the feature extraction module and adopt a reconstruction decoder to reconstruct the original images as a pre-training process, obtaining a pre-trained feature extraction module; (2) the stereo matching stage. We lock the parameters of the feature extraction module and employ stereo image pairs to train the cascaded recurrent module to get the final model. We implement the cascaded recurrent networks with two well-known feature extraction modules (CNN-based Restormer or Transformer-based ViT) to prove the effectiveness of our approach. Experimental results on the US3D and WHU-Stereo datasets show that: (1) Our training strategy can be used for CNN-based and Transformer-based methods on the remote sensing datasets with limited data to improve performance, outperforming the second-best network HMSM-Net by approximately 0.54% and 1.95% in terms of the percentage of the 3-px error on the WHU-Stereo and US3D datasets, respectively; (2) Our correlation manner can handle imperfect rectification, reducing the error rate by 8.9% on the random shift test; (3) Our method can predict high-quality disparity maps and achieve state-of-the-art performance, reducing the percentage of the 3-px error to 12.87% and 7.01% on the WHU-Stereo and US3D datasets, respectively. The source codes are released at <span><span>https://github.com/Archaic-Atom/MaskCRNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50269,\"journal\":{\"name\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"volume\":\"218 \",\"pages\":\"Pages 151-165\"},\"PeriodicalIF\":10.6000,\"publicationDate\":\"2024-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0924271624003940\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271624003940","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

摘要

卫星图像的立体匹配因数据缺失、域差异和不完善的校正而面临挑战。为解决这些问题，我们提出了针对高分辨率卫星立体图像的级联递归网络，该网络由特征提取模块和级联递归模块组成，具有遮蔽表示学习功能。首先，我们在级联递归模块中开发了相关性计算功能，以搜索外极线和相邻区域的结果，从而减轻错误校正带来的影响。其次，我们使用基于掩码表示学习的训练策略来处理缺失数据和不同的领域属性，从而提高数据利用率和特征表示能力。我们的训练策略包括两个阶段：（1）图像重建阶段。我们向特征提取模块输入遮蔽的左右图像，并采用重构解码器重构原始图像作为预训练过程，从而获得预训练的特征提取模块；（2）立体匹配阶段。我们锁定特征提取模块的参数，利用立体图像对训练级联递归模块，得到最终模型。我们用两个著名的特征提取模块（基于 CNN 的 Restormer 或基于 Transformer 的 ViT）来实现级联递归网络，以证明我们方法的有效性。在 US3D 和 WHU-Stereo 数据集上的实验结果表明(1) 在数据有限的遥感数据集上，我们的训练策略可用于基于 CNN 和基于 Transformer 的方法，以提高性能，在 WHU-Stereo 和 US3D 数据集上的 3-px 误差百分比分别比第二好的网络 HMSM-Net 高出约 0.54% 和 1.95%；(2) 我们的相关方式可以处理不完美的整流，在随机偏移测试中将误差率降低了 8.9%；(3) 我们的方法可以预测高质量的差距图，并达到最先进的性能，在 WHU-Stereo 和 US3D 数据集上的 3-px 误差百分比分别降低到 12.87% 和 7.01%。源代码发布于 https://github.com/Archaic-Atom/MaskCRNet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cascaded recurrent networks with masked representation learning for stereo matching of high-resolution satellite images

Stereo matching of satellite images presents challenges due to missing data, domain differences, and imperfect rectification. To address these issues, we propose cascaded recurrent networks with masked representation learning for high-resolution satellite stereo images, consisting of feature extraction and cascaded recurrent modules. First, we develop the correlation computation in the cascaded recurrent module to search for results on the epipolar line and adjacent areas, mitigating the impacts of erroneous rectification. Second, we use a training strategy based on masked representation learning to handle missing data and different domain attributes, enhancing data utilization and feature representation. Our training strategy includes two stages: (1) image reconstruction stage. We feed masked left or right images to the feature extraction module and adopt a reconstruction decoder to reconstruct the original images as a pre-training process, obtaining a pre-trained feature extraction module; (2) the stereo matching stage. We lock the parameters of the feature extraction module and employ stereo image pairs to train the cascaded recurrent module to get the final model. We implement the cascaded recurrent networks with two well-known feature extraction modules (CNN-based Restormer or Transformer-based ViT) to prove the effectiveness of our approach. Experimental results on the US3D and WHU-Stereo datasets show that: (1) Our training strategy can be used for CNN-based and Transformer-based methods on the remote sensing datasets with limited data to improve performance, outperforming the second-best network HMSM-Net by approximately 0.54% and 1.95% in terms of the percentage of the 3-px error on the WHU-Stereo and US3D datasets, respectively; (2) Our correlation manner can handle imperfect rectification, reducing the error rate by 8.9% on the random shift test; (3) Our method can predict high-quality disparity maps and achieve state-of-the-art performance, reducing the percentage of the 3-px error to 12.87% and 7.01% on the WHU-Stereo and US3D datasets, respectively. The source codes are released at https://github.com/Archaic-Atom/MaskCRNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.