Coarse-to-fine online latent representations matching for one-stage domain adaptive semantic segmentation

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2023-10-04 DOI:10.1016/j.patcog.2023.110019

Zihao Dong , Sijie Niu , Xizhan Gao , Xiuli Shao

{"title":"Coarse-to-fine online latent representations matching for one-stage domain adaptive semantic segmentation","authors":"Zihao Dong , Sijie Niu , Xizhan Gao , Xiuli Shao","doi":"10.1016/j.patcog.2023.110019","DOIUrl":null,"url":null,"abstract":"<div>Domain adaptive semantic segmentation is meaningful since collecting numerous labeled samples in different domains is expensive and time-consuming. Recent domain adaptation methods yield not so efficient performance compared with supervised learning. With the hypothesis that semantic feature can be shared across domains, this paper proposes a coarse-to-fine online matching architecture (COM) for one-stage domain adaptation. We consider subsequent learning stages progressively refining the task in the latent feature space, i.e. the finer set at each component is hierarchically derived from the coarser set of the previous components, including cross-domain global prototypes, categories and instances matching and anchor-points contrastive learning, which further achieve self-supervised learning with region-level pseudo label generated only in a single training step. Beforehand, feature refinement are performed to realize edge perception and inter-feature augmentation. Then, coarse-to-fine network fuses global and local consistency matching via specific distribution alignment between the source and target domain. Finally, the adversarial structure controls the uncertainty of generator prediction through the maximization of classification results and minimization of two classifiers discrepancy. This proposed method is evaluated in two unsupervised domain adaptation tasks, i.e. GTA5 <math><mo>→</mo></math> Cityscapes and SYNTHIA <math><mo>→</mo></math> Cityscapes. Extensive experiments verify the effectiveness of our proposed COM model and demonstrate its superiority over several state-of-the-art approaches.</div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"146 ","pages":"Article 110019"},"PeriodicalIF":7.6000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320323007161","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Domain adaptive semantic segmentation is meaningful since collecting numerous labeled samples in different domains is expensive and time-consuming. Recent domain adaptation methods yield not so efficient performance compared with supervised learning. With the hypothesis that semantic feature can be shared across domains, this paper proposes a coarse-to-fine online matching architecture (COM) for one-stage domain adaptation. We consider subsequent learning stages progressively refining the task in the latent feature space, i.e. the finer set at each component is hierarchically derived from the coarser set of the previous components, including cross-domain global prototypes, categories and instances matching and anchor-points contrastive learning, which further achieve self-supervised learning with region-level pseudo label generated only in a single training step. Beforehand, feature refinement are performed to realize edge perception and inter-feature augmentation. Then, coarse-to-fine network fuses global and local consistency matching via specific distribution alignment between the source and target domain. Finally, the adversarial structure controls the uncertainty of generator prediction through the maximization of classification results and minimization of two classifiers discrepancy. This proposed method is evaluated in two unsupervised domain adaptation tasks, i.e. GTA5 $\to$ Cityscapes and SYNTHIA $\to$ Cityscapes. Extensive experiments verify the effectiveness of our proposed COM model and demonstrate its superiority over several state-of-the-art approaches.

查看原文本刊更多论文

一阶段领域自适应语义分割的粗到精在线潜在表示匹配

领域自适应语义分割具有重要的意义，因为在不同的领域中收集大量的标记样本成本高且耗时长。与有监督学习相比，目前的领域自适应方法的性能并不理想。在假设语义特征可以跨领域共享的前提下，提出了一种用于一阶段领域自适应的从粗到精的在线匹配架构(COM)。我们考虑后续的学习阶段在潜在特征空间中逐步细化任务，即每个组件的更细集是由前一个组件的粗集分层导出的，包括跨域全局原型、类别和实例匹配以及锚点对比学习，从而进一步实现仅在单个训练步骤中生成区域级伪标签的自监督学习。在此之前，进行特征细化，实现边缘感知和特征间增强。然后，通过源域和目标域之间的特定分布对齐，实现从粗到细的网络融合全局和局部一致性匹配。最后，对抗性结构通过分类结果的最大化和两个分类器差异的最小化来控制生成器预测的不确定性。在GTA5→cityscape和SYNTHIA→cityscape两个无监督域自适应任务中对该方法进行了评估。大量的实验验证了我们提出的COM模型的有效性，并证明了它优于几种最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.