GSTM-SCD：面向多时相遥感图像语义变化检测的图增强时空状态空间模型

IF 12.2 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-16 DOI:10.1016/j.isprsjprs.2025.09.003

Xuanguang Liu , Chenguang Dai , Lei Ding , Zhenchao Zhang , Yujie Li , Xibing Zuo , Mengmeng Li , Hanyun Wang , Yuzhe Miao

{"title":"GSTM-SCD：面向多时相遥感图像语义变化检测的图增强时空状态空间模型","authors":"Xuanguang Liu , Chenguang Dai , Lei Ding , Zhenchao Zhang , Yujie Li , Xibing Zuo , Mengmeng Li , Hanyun Wang , Yuzhe Miao","doi":"10.1016/j.isprsjprs.2025.09.003","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-temporal Semantic change detection (MT-SCD) provides crucial information for a wide variety of applications, including land use monitoring, urban planning, and sustainable development. However, previous deep learning-based SCD approaches exhibit limitations in time-series semantic change analysis, particularly in understanding Earth surface change dynamics. Specifically, literature methods typically employ Siamese networks to exploit the multi-temporal information. This hinders temporal interactions, failing to comprehensively model spatio-temporal dependencies, causing substantial classification and detection errors in complex scenes. Another key issue is the neglect of temporal transitivity consistency, resulting in predictions that contradict the multi-temporal change chain rules inherent to MT-SCD. Furthermore, literature approaches do not consider dynamic adaptation to the number of observation dates, failing to process time-series remote sensing images (RSIs) with arbitrary time steps. To address these challenges, we propose a graph-enhanced spatio-temporal Mamba (GSTM-SCD) for MT-SCD (including both bi-temporal SCD and time-series SCD). It employs vision state space models to capture the spatio-temporal dependencies in multi-temporal RSIs, and leverages graph modeling to enhance inter-temporal dependencies. First, we employ a single-branch Mamba encoder to efficiently exploit multi-temporal semantics and construct a spatio-temporal graph optimization mechanism to facilitate interactions between multi-temporal RSIs, while maintaining spatial continuity of feature representations. Second, we introduce a bidirectional three-dimensional change scanning strategy to learn underlying semantic change patterns. Finally, a novel loss function tailored for time-series SCD is proposed, which regularizes the multi-temporal topological relationships within data. The resulting approach, GSTM-SCD, demonstrates significant accuracy improvements compared to the state-of-the-art (SOTA) methods. Experiments conducted on four open benchmark datasets (SECOND, Landsat-SCD, WUSU and DynamicEarthNet) demonstrate that our method surpasses the SOTA by 0.53%, 1.66%, 9.32% and 0.78% in SeK, respectively. Moreover, it significantly reduces computational costs in comparison with recent SOTA methods. The associated codes is made available at: <span><span><em>https://github.com/liuxuanguang/GSTM-SCD</em></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 73-91"},"PeriodicalIF":12.2000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images\",\"authors\":\"Xuanguang Liu , Chenguang Dai , Lei Ding , Zhenchao Zhang , Yujie Li , Xibing Zuo , Mengmeng Li , Hanyun Wang , Yuzhe Miao\",\"doi\":\"10.1016/j.isprsjprs.2025.09.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-temporal Semantic change detection (MT-SCD) provides crucial information for a wide variety of applications, including land use monitoring, urban planning, and sustainable development. However, previous deep learning-based SCD approaches exhibit limitations in time-series semantic change analysis, particularly in understanding Earth surface change dynamics. Specifically, literature methods typically employ Siamese networks to exploit the multi-temporal information. This hinders temporal interactions, failing to comprehensively model spatio-temporal dependencies, causing substantial classification and detection errors in complex scenes. Another key issue is the neglect of temporal transitivity consistency, resulting in predictions that contradict the multi-temporal change chain rules inherent to MT-SCD. Furthermore, literature approaches do not consider dynamic adaptation to the number of observation dates, failing to process time-series remote sensing images (RSIs) with arbitrary time steps. To address these challenges, we propose a graph-enhanced spatio-temporal Mamba (GSTM-SCD) for MT-SCD (including both bi-temporal SCD and time-series SCD). It employs vision state space models to capture the spatio-temporal dependencies in multi-temporal RSIs, and leverages graph modeling to enhance inter-temporal dependencies. First, we employ a single-branch Mamba encoder to efficiently exploit multi-temporal semantics and construct a spatio-temporal graph optimization mechanism to facilitate interactions between multi-temporal RSIs, while maintaining spatial continuity of feature representations. Second, we introduce a bidirectional three-dimensional change scanning strategy to learn underlying semantic change patterns. Finally, a novel loss function tailored for time-series SCD is proposed, which regularizes the multi-temporal topological relationships within data. The resulting approach, GSTM-SCD, demonstrates significant accuracy improvements compared to the state-of-the-art (SOTA) methods. Experiments conducted on four open benchmark datasets (SECOND, Landsat-SCD, WUSU and DynamicEarthNet) demonstrate that our method surpasses the SOTA by 0.53%, 1.66%, 9.32% and 0.78% in SeK, respectively. Moreover, it significantly reduces computational costs in comparison with recent SOTA methods. The associated codes is made available at: <span><span><em>https://github.com/liuxuanguang/GSTM-SCD</em></span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50269,\"journal\":{\"name\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"volume\":\"230 \",\"pages\":\"Pages 73-91\"},\"PeriodicalIF\":12.2000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0924271625003557\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625003557","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

摘要

多时相语义变化检测（MT-SCD）为土地利用监测、城市规划和可持续发展等广泛应用提供了重要信息。然而，以前基于深度学习的SCD方法在时间序列语义变化分析方面存在局限性，特别是在理解地球表面变化动力学方面。具体而言，文献方法通常使用暹罗网络来开发多时间信息。这阻碍了时间交互，无法全面建模时空依赖性，在复杂场景中导致大量分类和检测错误。另一个关键问题是忽略了时间传递性一致性，导致预测与MT-SCD固有的多时间变化链规则相矛盾。此外，文献方法没有考虑观测日期数量的动态自适应，无法处理任意时间步长的时序遥感图像（rsi）。为了解决这些挑战，我们提出了一个用于MT-SCD（包括双时间SCD和时间序列SCD）的图形增强时空曼巴（GSTM-SCD）。它采用视觉状态空间模型捕获多时间rsi中的时空依赖关系，并利用图建模来增强时间间依赖关系。首先，我们采用单分支曼巴编码器，有效地利用多时间语义，构建时空图优化机制，促进多时间rsi之间的交互，同时保持特征表示的空间连续性。其次，我们引入了一种双向三维变化扫描策略来学习潜在的语义变化模式。最后，提出了一种适合于时间序列SCD的损失函数，该函数对数据中的多时间拓扑关系进行了正则化。由此产生的方法，GSTM-SCD，与最先进的（SOTA）方法相比，显示出显著的精度提高。在SECOND、Landsat-SCD、WUSU和DynamicEarthNet四个开放基准数据集上进行的实验表明，我们的方法在SeK上分别比SOTA高0.53%、1.66%、9.32%和0.78%。此外，与最近的SOTA方法相比，它显著降低了计算成本。相关代码可在https://github.com/liuxuanguang/GSTM-SCD上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images

Multi-temporal Semantic change detection (MT-SCD) provides crucial information for a wide variety of applications, including land use monitoring, urban planning, and sustainable development. However, previous deep learning-based SCD approaches exhibit limitations in time-series semantic change analysis, particularly in understanding Earth surface change dynamics. Specifically, literature methods typically employ Siamese networks to exploit the multi-temporal information. This hinders temporal interactions, failing to comprehensively model spatio-temporal dependencies, causing substantial classification and detection errors in complex scenes. Another key issue is the neglect of temporal transitivity consistency, resulting in predictions that contradict the multi-temporal change chain rules inherent to MT-SCD. Furthermore, literature approaches do not consider dynamic adaptation to the number of observation dates, failing to process time-series remote sensing images (RSIs) with arbitrary time steps. To address these challenges, we propose a graph-enhanced spatio-temporal Mamba (GSTM-SCD) for MT-SCD (including both bi-temporal SCD and time-series SCD). It employs vision state space models to capture the spatio-temporal dependencies in multi-temporal RSIs, and leverages graph modeling to enhance inter-temporal dependencies. First, we employ a single-branch Mamba encoder to efficiently exploit multi-temporal semantics and construct a spatio-temporal graph optimization mechanism to facilitate interactions between multi-temporal RSIs, while maintaining spatial continuity of feature representations. Second, we introduce a bidirectional three-dimensional change scanning strategy to learn underlying semantic change patterns. Finally, a novel loss function tailored for time-series SCD is proposed, which regularizes the multi-temporal topological relationships within data. The resulting approach, GSTM-SCD, demonstrates significant accuracy improvements compared to the state-of-the-art (SOTA) methods. Experiments conducted on four open benchmark datasets (SECOND, Landsat-SCD, WUSU and DynamicEarthNet) demonstrate that our method surpasses the SOTA by 0.53%, 1.66%, 9.32% and 0.78% in SeK, respectively. Moreover, it significantly reduces computational costs in comparison with recent SOTA methods. The associated codes is made available at: https://github.com/liuxuanguang/GSTM-SCD.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.