{"title":"ACWCD: Utilizing Inherent Transformers Information and Prior Knowledge for Weakly Supervised Change Detection","authors":"Wenhao Liu;Zhuoyuan Yu;Bin Luo","doi":"10.1109/TGRS.2025.3527009","DOIUrl":null,"url":null,"abstract":"Change detection (CD) using deep learning is crucial for analyzing changes on the Earth’s surface. Yet, obtaining accurate, extensive pixel-level labels is difficult and time-consuming. Consequently, there is growing interest in weakly supervised CD (WSCD) using image-level labels, praised for its high efficiency in label acquisition. Nonetheless, the lack of adequate supervision leads many existing WSCD methods to adopt intricate processes, neglecting the inherent information present in the networks. To overcome these challenges, we propose ACWCD, an end-to-end encoder-decoder framework based on transformers for WSCD using image-level labels. The proposed framework is primarily designed for vision transformer (ViT)-related backbones. It generates effective pseudo labels by tapping into the localizing prowess of class activation maps (CAMs) and simultaneously utilizes these labels for pixel-level supervision during training. Specifically, ACWCD comprises two pivotal components: the attention refinement (AR) module and the change priori (CP) constraint. By harnessing the inherent multihead self-attention (MHSA) of transformers, the AR module refines CAMs by producing change attention from MHSA, thereby refining the pseudo labels. Furthermore, utilizing prior knowledge, the CP constraint prevents the AR module from processing samples with unchanged image-level labels, thus addressing the issue of AR generating spurious change areas. In addition, an exclusive threshold is assigned to each pair of images to help differentiate pseudo labels. It also imposes penalties based on the proportion of mispredictions using the designed plug-and-play loss function. To validate the performance of ACWCD, experiments are conducted on three high-resolution remote sensing datasets. The outcomes reveal that the proposed framework not only achieves new state-of-the-art (SOTA) performance within the WSCD domain but also exhibits substantial scalability, as it does not involve any complex processes, serving as a useful baseline for future research. The code is available at <uri>https://github.com/WenhaoLiu03/ACWCD</uri>.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-14"},"PeriodicalIF":8.6000,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10833791/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Change detection (CD) using deep learning is crucial for analyzing changes on the Earth’s surface. Yet, obtaining accurate, extensive pixel-level labels is difficult and time-consuming. Consequently, there is growing interest in weakly supervised CD (WSCD) using image-level labels, praised for its high efficiency in label acquisition. Nonetheless, the lack of adequate supervision leads many existing WSCD methods to adopt intricate processes, neglecting the inherent information present in the networks. To overcome these challenges, we propose ACWCD, an end-to-end encoder-decoder framework based on transformers for WSCD using image-level labels. The proposed framework is primarily designed for vision transformer (ViT)-related backbones. It generates effective pseudo labels by tapping into the localizing prowess of class activation maps (CAMs) and simultaneously utilizes these labels for pixel-level supervision during training. Specifically, ACWCD comprises two pivotal components: the attention refinement (AR) module and the change priori (CP) constraint. By harnessing the inherent multihead self-attention (MHSA) of transformers, the AR module refines CAMs by producing change attention from MHSA, thereby refining the pseudo labels. Furthermore, utilizing prior knowledge, the CP constraint prevents the AR module from processing samples with unchanged image-level labels, thus addressing the issue of AR generating spurious change areas. In addition, an exclusive threshold is assigned to each pair of images to help differentiate pseudo labels. It also imposes penalties based on the proportion of mispredictions using the designed plug-and-play loss function. To validate the performance of ACWCD, experiments are conducted on three high-resolution remote sensing datasets. The outcomes reveal that the proposed framework not only achieves new state-of-the-art (SOTA) performance within the WSCD domain but also exhibits substantial scalability, as it does not involve any complex processes, serving as a useful baseline for future research. The code is available at https://github.com/WenhaoLiu03/ACWCD.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.