A semantic-enhanced multi-modal remote sensing foundation model for Earth observation

IF 23.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Kang Wu, Yingying Zhang, Lixiang Ru, Bo Dang, Jiangwei Lao, Lei Yu, Junwei Luo, Zifan Zhu, Yue Sun, Jiahao Zhang, Qi Zhu, Jian Wang, Ming Yang, Jingdong Chen, Yongjun Zhang, Yansheng Li
{"title":"A semantic-enhanced multi-modal remote sensing foundation model for Earth observation","authors":"Kang Wu, Yingying Zhang, Lixiang Ru, Bo Dang, Jiangwei Lao, Lei Yu, Junwei Luo, Zifan Zhu, Yue Sun, Jiahao Zhang, Qi Zhu, Jian Wang, Ming Yang, Jingdong Chen, Yongjun Zhang, Yansheng Li","doi":"10.1038/s42256-025-01078-8","DOIUrl":null,"url":null,"abstract":"Remote sensing foundation models, pretrained on massive remote sensing data, have shown impressive performance in several Earth observation (EO) tasks. These models usually use single-modal temporal data for pretraining, which is insufficient for multi-modal applications. Moreover, these models require a considerable number of samples for fine-tuning in downstream tasks, posing challenges in time-sensitive scenarios, such as rapid flood mapping. We present SkySense++, a multi-modal remote sensing foundation model for diverse EO tasks. SkySense++ has a factorized architecture to accommodate multi-modal images acquired by diverse sensors. We adopt progressive pretraining, which involves two stages, on meticulously curated datasets of 27 million multi-modal remote sensing images. The first representation-enhanced pretraining stage uses multi-granularity contrastive learning to obtain general representations. The second semantic-enhanced pretraining stage leverages masked semantic learning to learn semantically enriched representations, enabling few-shot capabilities. This ability allows the model to handle unseen tasks with minimal labelled data, alleviating the need for fine-tuning on extensive annotated data. SkySense++ demonstrates consistent improvements in classification, detection and segmentation over previous state-of-the-art models across 12 EO tasks in 7 domains: agriculture, forestry, oceanography, atmosphere, biology, land surveying and disaster management. This generalizability may lead to a new chapter of remote sensing foundation model applications for EO tasks at scale. Wu et al. developed SkySense++, a multi-modal remote sensing foundation model pretrained on 27 million multi-modal images, which achieved robust generalization and few-shot capabilities across several Earth observation tasks and domains, including agriculture and disaster management.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 8","pages":"1235-1249"},"PeriodicalIF":23.9000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-025-01078-8","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Remote sensing foundation models, pretrained on massive remote sensing data, have shown impressive performance in several Earth observation (EO) tasks. These models usually use single-modal temporal data for pretraining, which is insufficient for multi-modal applications. Moreover, these models require a considerable number of samples for fine-tuning in downstream tasks, posing challenges in time-sensitive scenarios, such as rapid flood mapping. We present SkySense++, a multi-modal remote sensing foundation model for diverse EO tasks. SkySense++ has a factorized architecture to accommodate multi-modal images acquired by diverse sensors. We adopt progressive pretraining, which involves two stages, on meticulously curated datasets of 27 million multi-modal remote sensing images. The first representation-enhanced pretraining stage uses multi-granularity contrastive learning to obtain general representations. The second semantic-enhanced pretraining stage leverages masked semantic learning to learn semantically enriched representations, enabling few-shot capabilities. This ability allows the model to handle unseen tasks with minimal labelled data, alleviating the need for fine-tuning on extensive annotated data. SkySense++ demonstrates consistent improvements in classification, detection and segmentation over previous state-of-the-art models across 12 EO tasks in 7 domains: agriculture, forestry, oceanography, atmosphere, biology, land surveying and disaster management. This generalizability may lead to a new chapter of remote sensing foundation model applications for EO tasks at scale. Wu et al. developed SkySense++, a multi-modal remote sensing foundation model pretrained on 27 million multi-modal images, which achieved robust generalization and few-shot capabilities across several Earth observation tasks and domains, including agriculture and disaster management.

Abstract Image

Abstract Image

基于语义增强的多模态遥感对地观测基础模型
基于海量遥感数据进行预训练的遥感基础模型在若干对地观测任务中表现出令人印象深刻的性能。这些模型通常使用单模态时间数据进行预训练,这对于多模态应用来说是不够的。此外,这些模型需要大量的样本来进行下游任务的微调,这对时间敏感的场景(如快速洪水绘图)提出了挑战。我们提出了一个多模态遥感基础模型skysen++,用于各种EO任务。skysen++有一个分解的架构,以适应由不同的传感器获得的多模态图像。我们在精心整理的2700万多模态遥感图像数据集上采用了分两个阶段的渐进式预训练。第一个表征增强预训练阶段使用多粒度对比学习来获得一般表征。第二个语义增强的预训练阶段利用掩码语义学习来学习语义丰富的表示,从而实现少量射击功能。这种能力允许模型用最少的标记数据处理看不见的任务,减轻了对大量带注释的数据进行微调的需要。在农业、林业、海洋学、大气、生物学、土地测量和灾害管理等7个领域的12个EO任务中,skysen++在分类、检测和分割方面比以前最先进的模型有了持续的改进。这种通用性可能会为遥感基础模型在大规模EO任务中的应用开辟新的篇章。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
36.90
自引率
2.10%
发文量
127
期刊介绍: Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信