Wandong Jiang, Yuli Sun, Lin Lei, Gangyao Kuang, Kefeng Ji
{"title":"AdaptVFMs-RSCD: Advancing Remote Sensing Change Detection from binary to semantic with SAM and CLIP","authors":"Wandong Jiang, Yuli Sun, Lin Lei, Gangyao Kuang, Kefeng Ji","doi":"10.1016/j.isprsjprs.2025.09.010","DOIUrl":null,"url":null,"abstract":"<div><div>Remote Sensing Change Detection (RSCD) is essential for identifying surface changes from remote sensing images (RSIs) and plays a crucial role in land-use planning and disaster assessment. Despite advancements in RSI resolution and AI, most RSCD datasets are binary, hindering the transition to semantic change detection. Vision Foundation Models (VFMs), such as the Segment Anything Model (SAM), introduce new possibilities with robust zero-shot semantic segmentation capabilities, but face challenges with RSIs due to their unique characteristics, such as diverse perspectives and scale variations. To address these challenges, an enhanced RSCD method, AdaptVFMs-RSCD, has been proposed. This method integrates SAM with Contrastive Language-Image Pre-training (CLIP), capitalizing on CLIP’s ability to establish broad correspondences between images and text and to classify unseen image categories. This integration markedly improves the recognition of land cover types, better aligning VFMs with the specific requirements of RSCD. Additionally, a remote sensing VFM fine-tuning dataset was also developed to further enhance SAM’s segmentation performance on RSIs. Furthermore, a semantic information-based change detection module was designed to fully leverage both the change information and semantic information provided by VFMs, achieving state-of-the-art F1 and mIoU scores on the DSIFN (66.94%, 67.84%), CLCD (76.12%, 78.80%), and SYSU (81.14%, 78.35%) datasets. Notably, the comprehensive metrics F1 and mIoU on the SYSU dataset exceeded the second-best scores by 17.57% and 17.77%, respectively. AdaptVFMs-RSCD also facilitates the conversion of binary change detection datasets into semantic change detection datasets, advancing semantic change detection and expanding the application of vision language models and VFMs in remote sensing.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 304-317"},"PeriodicalIF":12.2000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625003636","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Remote Sensing Change Detection (RSCD) is essential for identifying surface changes from remote sensing images (RSIs) and plays a crucial role in land-use planning and disaster assessment. Despite advancements in RSI resolution and AI, most RSCD datasets are binary, hindering the transition to semantic change detection. Vision Foundation Models (VFMs), such as the Segment Anything Model (SAM), introduce new possibilities with robust zero-shot semantic segmentation capabilities, but face challenges with RSIs due to their unique characteristics, such as diverse perspectives and scale variations. To address these challenges, an enhanced RSCD method, AdaptVFMs-RSCD, has been proposed. This method integrates SAM with Contrastive Language-Image Pre-training (CLIP), capitalizing on CLIP’s ability to establish broad correspondences between images and text and to classify unseen image categories. This integration markedly improves the recognition of land cover types, better aligning VFMs with the specific requirements of RSCD. Additionally, a remote sensing VFM fine-tuning dataset was also developed to further enhance SAM’s segmentation performance on RSIs. Furthermore, a semantic information-based change detection module was designed to fully leverage both the change information and semantic information provided by VFMs, achieving state-of-the-art F1 and mIoU scores on the DSIFN (66.94%, 67.84%), CLCD (76.12%, 78.80%), and SYSU (81.14%, 78.35%) datasets. Notably, the comprehensive metrics F1 and mIoU on the SYSU dataset exceeded the second-best scores by 17.57% and 17.77%, respectively. AdaptVFMs-RSCD also facilitates the conversion of binary change detection datasets into semantic change detection datasets, advancing semantic change detection and expanding the application of vision language models and VFMs in remote sensing.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.