Context perturbation: A Consistent alignment approach for Domain Adaptive Semantic Segmentation

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-08-25 DOI:10.1016/j.cviu.2025.104464

Meiqin Liu , Zilin Wang , Chao Yao , Yao Zhao , Wei Wang , Yunchao Wei

{"title":"Context perturbation: A Consistent alignment approach for Domain Adaptive Semantic Segmentation","authors":"Meiqin Liu , Zilin Wang , Chao Yao , Yao Zhao , Wei Wang , Yunchao Wei","doi":"10.1016/j.cviu.2025.104464","DOIUrl":null,"url":null,"abstract":"<div><div>Domain Adaptive Semantic Segmentation (DASS) aims to adapt a pre-trained segmentation model from a labeled source domain to an unlabeled target domain. Previous approaches usually address the domain gap by consistency regularization which is implemented based on the augmented data. However, as the augmentations are often performed at the input level with simple linear transformations, the feature representations suffer limited perturbation from these augmented views. As a result, they are not effective for cross-domain consistency learning. In this work, we propose a new augmentation method, namely contextual augmentation, and combine it with contrastive learning approaches from both the pixel and class levels to achieve consistency regularization. We term this methodology as Context Perturbation for DASS (CoPDASeg). Specifically, contextual augmentation first combines domain information by class mix and then randomly crops two patches with an overlapping region. To achieve consistency regularization with the two augmented patches, we focus on both pixel and class perspectives and propose two parallel contrastive learning paradigms (<em>i.e.</em>, pixel-level contrastive learning and class-level contrastive learning). The former aligns the pixel-to-pixel feature representations, and later aligns class prototypes across domains. Experimental results on representative benchmarks (<em>i.e.</em>, <strong>GTA5</strong> <span><math><mo>→</mo></math></span><strong>Cityscapes</strong> and <strong>SYNTHIA</strong> <span><math><mo>→</mo></math></span> <strong>Cityscapes</strong>) demonstrate that CoPDASeg improves the segmentation performance over state-of-the-arts by a large margin.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104464"},"PeriodicalIF":3.5000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001870","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Domain Adaptive Semantic Segmentation (DASS) aims to adapt a pre-trained segmentation model from a labeled source domain to an unlabeled target domain. Previous approaches usually address the domain gap by consistency regularization which is implemented based on the augmented data. However, as the augmentations are often performed at the input level with simple linear transformations, the feature representations suffer limited perturbation from these augmented views. As a result, they are not effective for cross-domain consistency learning. In this work, we propose a new augmentation method, namely contextual augmentation, and combine it with contrastive learning approaches from both the pixel and class levels to achieve consistency regularization. We term this methodology as Context Perturbation for DASS (CoPDASeg). Specifically, contextual augmentation first combines domain information by class mix and then randomly crops two patches with an overlapping region. To achieve consistency regularization with the two augmented patches, we focus on both pixel and class perspectives and propose two parallel contrastive learning paradigms (i.e., pixel-level contrastive learning and class-level contrastive learning). The former aligns the pixel-to-pixel feature representations, and later aligns class prototypes across domains. Experimental results on representative benchmarks (i.e., GTA5

\to

Cityscapes and SYNTHIA

\to

Cityscapes) demonstrate that CoPDASeg improves the segmentation performance over state-of-the-arts by a large margin.

查看原文本刊更多论文

上下文扰动：领域自适应语义分割的一致对齐方法

领域自适应语义分割（Domain Adaptive Semantic Segmentation， DASS）旨在将预先训练好的分割模型从标记的源领域调整到未标记的目标领域。以前的方法通常通过基于增广数据实现的一致性正则化来解决域间隙问题。然而，由于增强通常是在输入级通过简单的线性变换进行的，因此特征表示受到这些增强视图的有限扰动。因此，它们对跨域一致性学习并不有效。在这项工作中，我们提出了一种新的增强方法，即上下文增强，并将其与来自像素和类水平的对比学习方法相结合，以实现一致性正则化。我们将这种方法称为上下文摄动法（CoPDASeg）。具体来说，上下文增强首先通过类混合组合领域信息，然后随机裁剪两个重叠区域的补丁。为了实现两个增强块的一致性正则化，我们同时关注像素和类的视角，并提出了两种并行的对比学习范式（即像素级对比学习和类级对比学习）。前者对齐像素到像素的特征表示，后者跨域对齐类原型。在代表性基准（即GTA5→cityscape和SYNTHIA→cityscape）上的实验结果表明，CoPDASeg在很大程度上提高了最先进的分割性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems