{"title":"Beyond Pixel-Level Annotation: Exploring Self-Supervised Learning for Change Detection With Image-Level Supervision","authors":"Maofan Zhao;Xinli Hu;Linlin Zhang;Qingyan Meng;Yuxing Chen;Lorenzo Bruzzone","doi":"10.1109/TGRS.2024.3379431","DOIUrl":null,"url":null,"abstract":"Change detection (CD) in high-resolution remote sensing has received large attention due to its wide range of applications. Many methods have been proposed in the literature and achieved excellent performance. However, they are often fully supervised, thus requiring abundant pixel-level labeled samples, which is time-consuming and labor-intensive. Especially compared to the common single-temporal interpretation, labeling bi-temporal images is often more complicated. Therefore, this study combines weakly supervised learning (WSL) to reduce label acquisition costs. However, changed regions are small, fragmented, and similar to the background, which increases the gap between weakly supervised and fully supervised tasks. To address these difficulties, we explore self-supervised methods to construct a WSL framework based on image-level labels for general CD, termed WSLCD in this article. First, we design a double-branch Siamese network to derive embeddings and initial class attention maps (CAMs), which input the original image pair and the spatially transformed image pair. Second, mutual learning and equivariant regularization (MLER) are enforced on CAMs from different views, which implements consistency constraints in confusion regions and makes CAMs learn from each other based on saliency regions. Furthermore, prototype-based contrastive learning (PCL) is designed such that unreliable pixels can learn from prototypes computed from reliable pixels. PCL includes intraview contrast and cross-view contrast depending on whether the prototypes and class embeddings are from the same view. With the above strategies, we narrow the gap between image-level weakly supervised CD and fully supervised CD. Experiments are conducted on three CD datasets, including CLCD, DSIFN, and GCD. Our method achieves state-of-the-art performance on pseudo-label generation and CD. The code is available at \n<uri>https://github.com/mfzhao1998/WSLCD</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"62 ","pages":"1-16"},"PeriodicalIF":8.6000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10476394/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Change detection (CD) in high-resolution remote sensing has received large attention due to its wide range of applications. Many methods have been proposed in the literature and achieved excellent performance. However, they are often fully supervised, thus requiring abundant pixel-level labeled samples, which is time-consuming and labor-intensive. Especially compared to the common single-temporal interpretation, labeling bi-temporal images is often more complicated. Therefore, this study combines weakly supervised learning (WSL) to reduce label acquisition costs. However, changed regions are small, fragmented, and similar to the background, which increases the gap between weakly supervised and fully supervised tasks. To address these difficulties, we explore self-supervised methods to construct a WSL framework based on image-level labels for general CD, termed WSLCD in this article. First, we design a double-branch Siamese network to derive embeddings and initial class attention maps (CAMs), which input the original image pair and the spatially transformed image pair. Second, mutual learning and equivariant regularization (MLER) are enforced on CAMs from different views, which implements consistency constraints in confusion regions and makes CAMs learn from each other based on saliency regions. Furthermore, prototype-based contrastive learning (PCL) is designed such that unreliable pixels can learn from prototypes computed from reliable pixels. PCL includes intraview contrast and cross-view contrast depending on whether the prototypes and class embeddings are from the same view. With the above strategies, we narrow the gap between image-level weakly supervised CD and fully supervised CD. Experiments are conducted on three CD datasets, including CLCD, DSIFN, and GCD. Our method achieves state-of-the-art performance on pseudo-label generation and CD. The code is available at
https://github.com/mfzhao1998/WSLCD
.
高分辨率遥感中的变化检测(CD)因其广泛的应用而备受关注。文献中提出了许多方法,并取得了优异的性能。然而,这些方法通常都是全监督式的,因此需要大量像素级的标注样本,耗时耗力。特别是与常见的单时相判读相比,双时相图像的标注往往更为复杂。因此,本研究结合弱监督学习(WSL)来降低标签获取成本。然而,改变的区域较小、零散且与背景相似,这就加大了弱监督任务与完全监督任务之间的差距。为了解决这些困难,我们探索了自监督方法,为一般 CD 构建了基于图像级标签的 WSL 框架,本文称之为 WSLCD。首先,我们设计了一个双分支连体网络来推导嵌入和初始类注意力图(CAM),输入原始图像对和空间变换图像对。其次,对来自不同视角的 CAMs 实施相互学习和等变正则化(MLER),从而在混淆区域实现一致性约束,并使 CAMs 基于显著性区域相互学习。此外,还设计了基于原型的对比学习(PCL),使不可靠像素可以从可靠像素计算出的原型中学习。PCL 包括视图内对比和跨视图对比,取决于原型和类嵌入是否来自同一视图。通过上述策略,我们缩小了图像级弱监督 CD 与完全监督 CD 之间的差距。我们在 CLCD、DSIFN 和 GCD 等三个 CD 数据集上进行了实验。我们的方法在伪标签生成和 CD 方面达到了最先进的性能。代码见 https://github.com/mfzhao1998/WSLCD。
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.