Integrating remote sensing with OpenStreetMap data for comprehensive scene understanding through multi-modal self-supervised learning

IF 11.1 1区地球科学 Q1 ENVIRONMENTAL SCIENCES

Remote Sensing of Environment Pub Date : 2024-12-23 DOI:10.1016/j.rse.2024.114573

Lubin Bai, Xiuyuan Zhang, Haoyu Wang, Shihong Du

{"title":"Integrating remote sensing with OpenStreetMap data for comprehensive scene understanding through multi-modal self-supervised learning","authors":"Lubin Bai, Xiuyuan Zhang, Haoyu Wang, Shihong Du","doi":"10.1016/j.rse.2024.114573","DOIUrl":null,"url":null,"abstract":"OpenStreetMap (OSM) contains valuable geographic knowledge for remote sensing (RS) interpretation. They can provide correlated and complementary descriptions of a given region. Integrating RS images with OSM data can lead to a more comprehensive understanding of a geographic scene. But due to the significant differences between them, little progress has been made in data fusion for RS and OSM data, and how to extract, interact, and collaborate the information from multiple geographic data sources remains largely unexplored. In this work, we focus on designing a multi-modal self-supervised learning (SSL) approach to fuse RS images and OSM data, which can extract meaningful features from the two complementary data sources in an unsupervised manner, resulting in comprehensive scene understanding. We harmonize the parts of information extraction, interaction, and collaboration for RS and OSM data into a unified SSL framework, named Rose. For information extraction, we start from the complementarity between the two modalities, designing an OSM encoder to harmoniously align with the ViT image encoder. For information interaction, we leverage the spatial correlation between RS and OSM data to guide the cross-attention module, thereby enhancing the information transfer. For information collaboration, we design the joint mask-reconstruction learning strategy to achieve cooperation between the two modalities, which reconstructs the original inputs by referring to information from both sources. The three parts are interlinked and blending seamlessly into a unified framework. Finally, Rose can generate three kinds of representations, i.e., RS feature, OSM feature, and RS-OSM fusion feature, which can be used for multiple downstream tasks. Extensive experiments on land use semantic segmentation, population estimation, and carbon emission estimation tasks demonstrate the multitasking capability, label efficiency, and robustness to noise of Rose. Rose can associate RS images and OSM data at a fine level of granularity, enhancing its effectiveness on fine-grained tasks like land use semantic segmentation. The code can be found at <span><span>https://github.com/bailubin/Rose</span><svg aria-label=\"Opens in new window\" focusable=\"false\" height=\"20\" viewbox=\"0 0 8 8\"><path d=\"M1.12949 2.1072V1H7V6.85795H5.89111V2.90281L0.784057 8L0 7.21635L5.11902 2.1072H1.12949Z\"></path></svg></span>.","PeriodicalId":417,"journal":{"name":"Remote Sensing of Environment","volume":"80 1","pages":""},"PeriodicalIF":11.1000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing of Environment","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.rse.2024.114573","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

OpenStreetMap (OSM) contains valuable geographic knowledge for remote sensing (RS) interpretation. They can provide correlated and complementary descriptions of a given region. Integrating RS images with OSM data can lead to a more comprehensive understanding of a geographic scene. But due to the significant differences between them, little progress has been made in data fusion for RS and OSM data, and how to extract, interact, and collaborate the information from multiple geographic data sources remains largely unexplored. In this work, we focus on designing a multi-modal self-supervised learning (SSL) approach to fuse RS images and OSM data, which can extract meaningful features from the two complementary data sources in an unsupervised manner, resulting in comprehensive scene understanding. We harmonize the parts of information extraction, interaction, and collaboration for RS and OSM data into a unified SSL framework, named Rose. For information extraction, we start from the complementarity between the two modalities, designing an OSM encoder to harmoniously align with the ViT image encoder. For information interaction, we leverage the spatial correlation between RS and OSM data to guide the cross-attention module, thereby enhancing the information transfer. For information collaboration, we design the joint mask-reconstruction learning strategy to achieve cooperation between the two modalities, which reconstructs the original inputs by referring to information from both sources. The three parts are interlinked and blending seamlessly into a unified framework. Finally, Rose can generate three kinds of representations, i.e., RS feature, OSM feature, and RS-OSM fusion feature, which can be used for multiple downstream tasks. Extensive experiments on land use semantic segmentation, population estimation, and carbon emission estimation tasks demonstrate the multitasking capability, label efficiency, and robustness to noise of Rose. Rose can associate RS images and OSM data at a fine level of granularity, enhancing its effectiveness on fine-grained tasks like land use semantic segmentation. The code can be found at https://github.com/bailubin/Rose.

查看原文本刊更多论文

结合遥感与OpenStreetMap数据，通过多模态自监督学习实现全面的场景理解

OpenStreetMap （OSM）包含遥感（RS）解译的宝贵地理知识。它们可以提供给定区域的相关和互补描述。将RS图像与OSM数据相结合可以更全面地了解地理场景。但由于两者之间的显著差异，RS和OSM数据的数据融合进展甚微，如何从多个地理数据源中提取、交互和协作信息仍然是一个很大的未知领域。在这项工作中，我们专注于设计一种多模态自监督学习（SSL）方法来融合RS图像和OSM数据，该方法可以以无监督的方式从两个互补的数据源中提取有意义的特征，从而实现全面的场景理解。我们将RS和OSM数据的信息提取、交互和协作部分协调到一个名为Rose的统一SSL框架中。在信息提取方面，我们从两种模式的互补性出发，设计了一种OSM编码器与ViT图像编码器和谐对齐。在信息交互方面，我们利用RS和OSM数据之间的空间相关性来引导交叉注意模块，从而增强信息传递。在信息协作方面，我们设计了联合掩模重建学习策略，实现了两种模式之间的合作，该策略通过参考两个来源的信息来重建原始输入。这三个部分相互联系，无缝地融合成一个统一的框架。最后，Rose可以生成三种表示，即RS特征、OSM特征和RS-OSM融合特征，可以用于多个下游任务。在土地利用语义分割、人口估计和碳排放估计任务上的大量实验证明了Rose的多任务处理能力、标签效率和对噪声的鲁棒性。Rose可以将RS图像和OSM数据在细粒度级别上关联起来，从而增强其在细粒度任务（如土地使用语义分割）上的有效性。代码可以在https://github.com/bailubin/Rose上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Remote Sensing of Environment 环境科学-成像科学与照相技术

CiteScore

25.10

自引率

8.90%

发文量

455

审稿时长

53 days

期刊介绍： Remote Sensing of Environment (RSE) serves the Earth observation community by disseminating results on the theory, science, applications, and technology that contribute to advancing the field of remote sensing. With a thoroughly interdisciplinary approach, RSE encompasses terrestrial, oceanic, and atmospheric sensing. The journal emphasizes biophysical and quantitative approaches to remote sensing at local to global scales, covering a diverse range of applications and techniques. RSE serves as a vital platform for the exchange of knowledge and advancements in the dynamic field of remote sensing.