Yuntao Du , Yushi Chen , Lingbo Huang , Yahu Yang , Pedram Ghamisi , Qian Du
{"title":"SUMMIT:具有多个辅助任务的SAR基础模型,增强了其固有特性","authors":"Yuntao Du , Yushi Chen , Lingbo Huang , Yahu Yang , Pedram Ghamisi , Qian Du","doi":"10.1016/j.jag.2025.104624","DOIUrl":null,"url":null,"abstract":"<div><div>Synthetic Aperture Radar (SAR) is a crucial tool in remote sensing, yet existing deep learning methods are primarily limited in visual representation, neglecting the intrinsic characteristics of SAR and the need for strong generalization across multiple tasks. To address this, we propose SUMMIT (SAR foUndational Model with Multiple auxiliary tasks enhanced Intrinsic characterisTics), a foundational model tailored for SAR image understanding. SUMMIT is pre-trained on the Multi-sensor SAR Image Dataset (MuSID), which contains over 560,000 SAR images. To enhance its feature extraction capability, we introduce a masked image modeling (MIM) framework with self-supervised auxiliary tasks (SSATs): (1) MIM for learning robust structural representations, (2) self-supervised denoising to improve the model’s noise resistance, and (3) space scattering feature enhancement to preserve geometric consistency. Furthermore, we design an auxiliary task coordination module (ATCM) to balance these tasks and ensure effective feature fusion. The resulting self-supervised framework enables SUMMIT to integrate deep learning with SAR’s physical attributes effectively. Extensive experiments across seven datasets and three downstream tasks demonstrate that SUMMIT achieves state-of-the-art performance, particularly in SAR classification, detection, and segmentation. Code and pre-trained model of the proposed SUMMIT will be available at <span><span>https://github.com/Yunsans/SUMMIT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"141 ","pages":"Article 104624"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SUMMIT: A SAR foundation model with multiple auxiliary tasks enhanced intrinsic characteristics\",\"authors\":\"Yuntao Du , Yushi Chen , Lingbo Huang , Yahu Yang , Pedram Ghamisi , Qian Du\",\"doi\":\"10.1016/j.jag.2025.104624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Synthetic Aperture Radar (SAR) is a crucial tool in remote sensing, yet existing deep learning methods are primarily limited in visual representation, neglecting the intrinsic characteristics of SAR and the need for strong generalization across multiple tasks. To address this, we propose SUMMIT (SAR foUndational Model with Multiple auxiliary tasks enhanced Intrinsic characterisTics), a foundational model tailored for SAR image understanding. SUMMIT is pre-trained on the Multi-sensor SAR Image Dataset (MuSID), which contains over 560,000 SAR images. To enhance its feature extraction capability, we introduce a masked image modeling (MIM) framework with self-supervised auxiliary tasks (SSATs): (1) MIM for learning robust structural representations, (2) self-supervised denoising to improve the model’s noise resistance, and (3) space scattering feature enhancement to preserve geometric consistency. Furthermore, we design an auxiliary task coordination module (ATCM) to balance these tasks and ensure effective feature fusion. The resulting self-supervised framework enables SUMMIT to integrate deep learning with SAR’s physical attributes effectively. Extensive experiments across seven datasets and three downstream tasks demonstrate that SUMMIT achieves state-of-the-art performance, particularly in SAR classification, detection, and segmentation. Code and pre-trained model of the proposed SUMMIT will be available at <span><span>https://github.com/Yunsans/SUMMIT</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"141 \",\"pages\":\"Article 104624\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225002717\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225002717","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0
摘要
合成孔径雷达(SAR)是遥感研究的重要工具,但现有的深度学习方法主要局限于视觉表征,忽视了合成孔径雷达的内在特性和对多任务强泛化的需求。为了解决这个问题,我们提出了SUMMIT (SAR foundation Model with Multiple auxiliary tasks enhanced Intrinsic characterisTics),这是一个为SAR图像理解量身定制的基础模型。SUMMIT是在多传感器SAR图像数据集(MuSID)上进行预训练的,该数据集包含超过56万张SAR图像。为了增强其特征提取能力,我们引入了一种带有自监督辅助任务(SSATs)的掩蔽图像建模(MIM)框架:(1)用于学习鲁棒结构表征的MIM;(2)用于提高模型抗噪能力的自监督去噪;(3)用于保持几何一致性的空间散射特征增强。此外,我们还设计了一个辅助任务协调模块(ATCM)来平衡这些任务,确保有效的特征融合。由此产生的自监督框架使SUMMIT能够有效地将深度学习与SAR的物理属性相结合。在七个数据集和三个下游任务上进行的广泛实验表明,SUMMIT实现了最先进的性能,特别是在SAR分类、检测和分割方面。拟议首脑会议的代码和预先训练的模型将在https://github.com/Yunsans/SUMMIT上提供。
SUMMIT: A SAR foundation model with multiple auxiliary tasks enhanced intrinsic characteristics
Synthetic Aperture Radar (SAR) is a crucial tool in remote sensing, yet existing deep learning methods are primarily limited in visual representation, neglecting the intrinsic characteristics of SAR and the need for strong generalization across multiple tasks. To address this, we propose SUMMIT (SAR foUndational Model with Multiple auxiliary tasks enhanced Intrinsic characterisTics), a foundational model tailored for SAR image understanding. SUMMIT is pre-trained on the Multi-sensor SAR Image Dataset (MuSID), which contains over 560,000 SAR images. To enhance its feature extraction capability, we introduce a masked image modeling (MIM) framework with self-supervised auxiliary tasks (SSATs): (1) MIM for learning robust structural representations, (2) self-supervised denoising to improve the model’s noise resistance, and (3) space scattering feature enhancement to preserve geometric consistency. Furthermore, we design an auxiliary task coordination module (ATCM) to balance these tasks and ensure effective feature fusion. The resulting self-supervised framework enables SUMMIT to integrate deep learning with SAR’s physical attributes effectively. Extensive experiments across seven datasets and three downstream tasks demonstrate that SUMMIT achieves state-of-the-art performance, particularly in SAR classification, detection, and segmentation. Code and pre-trained model of the proposed SUMMIT will be available at https://github.com/Yunsans/SUMMIT.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.