AdaptVFMs-RSCD: Advancing Remote Sensing Change Detection from binary to semantic with SAM and CLIP

IF 12.2 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-24 DOI:10.1016/j.isprsjprs.2025.09.010

Wandong Jiang, Yuli Sun, Lin Lei, Gangyao Kuang, Kefeng Ji

{"title":"AdaptVFMs-RSCD: Advancing Remote Sensing Change Detection from binary to semantic with SAM and CLIP","authors":"Wandong Jiang, Yuli Sun, Lin Lei, Gangyao Kuang, Kefeng Ji","doi":"10.1016/j.isprsjprs.2025.09.010","DOIUrl":null,"url":null,"abstract":"<div><div>Remote Sensing Change Detection (RSCD) is essential for identifying surface changes from remote sensing images (RSIs) and plays a crucial role in land-use planning and disaster assessment. Despite advancements in RSI resolution and AI, most RSCD datasets are binary, hindering the transition to semantic change detection. Vision Foundation Models (VFMs), such as the Segment Anything Model (SAM), introduce new possibilities with robust zero-shot semantic segmentation capabilities, but face challenges with RSIs due to their unique characteristics, such as diverse perspectives and scale variations. To address these challenges, an enhanced RSCD method, AdaptVFMs-RSCD, has been proposed. This method integrates SAM with Contrastive Language-Image Pre-training (CLIP), capitalizing on CLIP’s ability to establish broad correspondences between images and text and to classify unseen image categories. This integration markedly improves the recognition of land cover types, better aligning VFMs with the specific requirements of RSCD. Additionally, a remote sensing VFM fine-tuning dataset was also developed to further enhance SAM’s segmentation performance on RSIs. Furthermore, a semantic information-based change detection module was designed to fully leverage both the change information and semantic information provided by VFMs, achieving state-of-the-art F1 and mIoU scores on the DSIFN (66.94%, 67.84%), CLCD (76.12%, 78.80%), and SYSU (81.14%, 78.35%) datasets. Notably, the comprehensive metrics F1 and mIoU on the SYSU dataset exceeded the second-best scores by 17.57% and 17.77%, respectively. AdaptVFMs-RSCD also facilitates the conversion of binary change detection datasets into semantic change detection datasets, advancing semantic change detection and expanding the application of vision language models and VFMs in remote sensing.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 304-317"},"PeriodicalIF":12.2000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625003636","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Remote Sensing Change Detection (RSCD) is essential for identifying surface changes from remote sensing images (RSIs) and plays a crucial role in land-use planning and disaster assessment. Despite advancements in RSI resolution and AI, most RSCD datasets are binary, hindering the transition to semantic change detection. Vision Foundation Models (VFMs), such as the Segment Anything Model (SAM), introduce new possibilities with robust zero-shot semantic segmentation capabilities, but face challenges with RSIs due to their unique characteristics, such as diverse perspectives and scale variations. To address these challenges, an enhanced RSCD method, AdaptVFMs-RSCD, has been proposed. This method integrates SAM with Contrastive Language-Image Pre-training (CLIP), capitalizing on CLIP’s ability to establish broad correspondences between images and text and to classify unseen image categories. This integration markedly improves the recognition of land cover types, better aligning VFMs with the specific requirements of RSCD. Additionally, a remote sensing VFM fine-tuning dataset was also developed to further enhance SAM’s segmentation performance on RSIs. Furthermore, a semantic information-based change detection module was designed to fully leverage both the change information and semantic information provided by VFMs, achieving state-of-the-art F1 and mIoU scores on the DSIFN (66.94%, 67.84%), CLCD (76.12%, 78.80%), and SYSU (81.14%, 78.35%) datasets. Notably, the comprehensive metrics F1 and mIoU on the SYSU dataset exceeded the second-best scores by 17.57% and 17.77%, respectively. AdaptVFMs-RSCD also facilitates the conversion of binary change detection datasets into semantic change detection datasets, advancing semantic change detection and expanding the application of vision language models and VFMs in remote sensing.

查看原文本刊更多论文

AdaptVFMs-RSCD：基于SAM和CLIP的从二进制到语义的遥感变化检测

遥感变化检测（RSCD）是从遥感影像中识别地表变化的重要手段，在土地利用规划和灾害评估中发挥着重要作用。尽管在RSI分辨率和人工智能方面取得了进步，但大多数RSCD数据集是二进制的，阻碍了向语义变化检测的过渡。视觉基础模型（Vision Foundation Models, VFMs），如分割任意模型（Segment Anything Model， SAM），引入了具有鲁棒零采样语义分割能力的新可能性，但由于其独特的特征，如不同的视角和规模变化，rsi面临着挑战。为了应对这些挑战，我们提出了一种增强的RSCD方法，AdaptVFMs-RSCD。该方法将SAM与对比语言图像预训练（CLIP）相结合，利用CLIP的能力在图像和文本之间建立广泛的对应关系，并对未见过的图像类别进行分类。这种整合显著提高了对土地覆盖类型的识别，更好地使VFMs与RSCD的具体要求保持一致。此外，还开发了遥感VFM微调数据集，以进一步提高SAM在rsi上的分割性能。此外，设计了基于语义信息的变更检测模块，充分利用了VFMs提供的变更信息和语义信息，在DSIFN (66.94%, 67.84%)， CLCD（76.12%, 78.80%）和SYSU（81.14%, 78.35%）数据集上获得了最先进的F1和mIoU分数。值得注意的是，SYSU数据集上的综合指标F1和mIoU分别超过了第二好的分数17.57%和17.77%。AdaptVFMs-RSCD还促进了二值变化检测数据集到语义变化检测数据集的转换，推进了语义变化检测，扩展了视觉语言模型和VFMs在遥感中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.