Semantic-Aware Message Broadcasting for Efficient Unsupervised Domain Adaptation

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-08-08 DOI:10.1109/TIP.2024.3437212

Xin Li;Cuiling Lan;Guoqiang Wei;Zhibo Chen

{"title":"Semantic-Aware Message Broadcasting for Efficient Unsupervised Domain Adaptation","authors":"Xin Li;Cuiling Lan;Guoqiang Wei;Zhibo Chen","doi":"10.1109/TIP.2024.3437212","DOIUrl":null,"url":null,"abstract":"Vision transformer has demonstrated great potential in abundant vision tasks. However, it also inevitably suffers from poor generalization capability when the distribution shift occurs in testing (i.e., out-of-distribution data). To mitigate this issue, we propose a novel method, Semantic-aware Message Broadcasting (SAMB), which enables more informative and flexible feature alignment for unsupervised domain adaptation (UDA). Particularly, we study the attention module in the vision transformer and notice that the alignment space using one global class token lacks enough flexibility, where it interacts information with all image tokens in the same manner but ignores the rich semantics of different regions. In this paper, we aim to improve the richness of the alignment features by enabling semantic-aware adaptive message broadcasting. Particularly, we introduce a group of learned group tokens as nodes to aggregate the global information from all image tokens, but encourage different group tokens to adaptively focus on the message broadcasting to different semantic regions. In this way, our message broadcasting encourages the group tokens to learn more informative and diverse information for effective domain alignment. Moreover, we systematically study the effects of adversarial-based feature alignment (ADA) and pseudo-label based self-training (PST) on UDA. We find that one simple two-stage training strategy with the cooperation of ADA and PST can further improve the adaptation capability of the vision transformer. Extensive experiments on DomainNet, OfficeHome, and VisDA-2017 demonstrate the effectiveness of our methods for UDA.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5340-5353"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10630651/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Vision transformer has demonstrated great potential in abundant vision tasks. However, it also inevitably suffers from poor generalization capability when the distribution shift occurs in testing (i.e., out-of-distribution data). To mitigate this issue, we propose a novel method, Semantic-aware Message Broadcasting (SAMB), which enables more informative and flexible feature alignment for unsupervised domain adaptation (UDA). Particularly, we study the attention module in the vision transformer and notice that the alignment space using one global class token lacks enough flexibility, where it interacts information with all image tokens in the same manner but ignores the rich semantics of different regions. In this paper, we aim to improve the richness of the alignment features by enabling semantic-aware adaptive message broadcasting. Particularly, we introduce a group of learned group tokens as nodes to aggregate the global information from all image tokens, but encourage different group tokens to adaptively focus on the message broadcasting to different semantic regions. In this way, our message broadcasting encourages the group tokens to learn more informative and diverse information for effective domain alignment. Moreover, we systematically study the effects of adversarial-based feature alignment (ADA) and pseudo-label based self-training (PST) on UDA. We find that one simple two-stage training strategy with the cooperation of ADA and PST can further improve the adaptation capability of the vision transformer. Extensive experiments on DomainNet, OfficeHome, and VisDA-2017 demonstrate the effectiveness of our methods for UDA.

查看原文本刊更多论文

面向高效无监督领域适应的语义感知信息广播。

视觉变换器在丰富的视觉任务中展现了巨大的潜力。然而，当测试中出现分布偏移（即分布外数据）时，它也不可避免地存在泛化能力差的问题。为了缓解这一问题，我们提出了一种新方法--语义感知信息广播（SAMB），它能为无监督领域适应（UDA）提供更多信息和更灵活的特征配准。我们特别研究了视觉转换器中的注意力模块，发现使用一个全局类标记的配准空间缺乏足够的灵活性，它以相同的方式与所有图像标记进行信息交互，却忽略了不同区域的丰富语义。在本文中，我们旨在通过实现语义感知的自适应信息广播来提高配准特征的丰富性。特别是，我们引入了一组学习到的组标记作为节点，汇总来自所有图像标记的全局信息，但鼓励不同的组标记自适应地将信息广播重点放在不同的语义区域。通过这种方式，我们的信息广播鼓励组标记学习更多不同的信息，从而实现有效的领域对齐。此外，我们还系统地研究了基于对抗的特征对齐（ADA）和基于伪标签的自我训练（PST）对 UDA 的影响。我们发现，一个简单的两阶段训练策略（ADA 和 PST）可以进一步提高视觉转换器的适应能力。在 DomainNet、OfficeHome 和 VisDA-2017 上进行的大量实验证明了我们的方法对 UDA 的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量