COLA: Context-Aware Language-Driven Test-Time Adaptation

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-09-19 DOI:10.1109/TIP.2025.3607634

Aiming Zhang;Tianyuan Yu;Liang Bai;Jun Tang;Yanming Guo;Yirun Ruan;Yun Zhou;Zhihe Lu

{"title":"COLA: Context-Aware Language-Driven Test-Time Adaptation","authors":"Aiming Zhang;Tianyuan Yu;Liang Bai;Jun Tang;Yanming Guo;Yirun Ruan;Yun Zhou;Zhihe Lu","doi":"10.1109/TIP.2025.3607634","DOIUrl":null,"url":null,"abstract":"Test-time adaptation (TTA) has gained increasing popularity due to its efficacy in addressing “distribution shift” issue while simultaneously protecting data privacy. However, most prior methods assume that a paired source domain model and target domain sharing the same label space coexist, heavily limiting their applicability. In this paper, we investigate a more general source model capable of adaptation to multiple target domains without needing shared labels. This is achieved by using a pre-trained vision-language model (VLM), e.g., CLIP, that can recognize images through matching with class descriptions. While the zero-shot performance of VLMs is impressive, they struggle to effectively capture the distinctive attributes of a target domain. To that end, we propose a novel method – Context-aware Language-driven TTA (COLA). The proposed method incorporates a lightweight context-aware module that consists of three key components: a task-aware adapter, a context-aware unit, and a residual connection unit for exploring task-specific knowledge, domain-specific knowledge from the VLM and prior knowledge of the VLM, respectively. It is worth noting that the context-aware module can be seamlessly integrated into a frozen VLM, ensuring both minimal effort and parameter efficiency. Additionally, we introduce a Class-Balanced Pseudo-labeling (CBPL) strategy to mitigate the adverse effects caused by class imbalance. We demonstrate the effectiveness of our method not only in TTA scenarios but also in class generalisation tasks. The source code is available at <uri>https://github.com/NUDT-Bai-Group/COLA-TTA</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6002-6015"},"PeriodicalIF":13.7000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11174099/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Test-time adaptation (TTA) has gained increasing popularity due to its efficacy in addressing “distribution shift” issue while simultaneously protecting data privacy. However, most prior methods assume that a paired source domain model and target domain sharing the same label space coexist, heavily limiting their applicability. In this paper, we investigate a more general source model capable of adaptation to multiple target domains without needing shared labels. This is achieved by using a pre-trained vision-language model (VLM), e.g., CLIP, that can recognize images through matching with class descriptions. While the zero-shot performance of VLMs is impressive, they struggle to effectively capture the distinctive attributes of a target domain. To that end, we propose a novel method – Context-aware Language-driven TTA (COLA). The proposed method incorporates a lightweight context-aware module that consists of three key components: a task-aware adapter, a context-aware unit, and a residual connection unit for exploring task-specific knowledge, domain-specific knowledge from the VLM and prior knowledge of the VLM, respectively. It is worth noting that the context-aware module can be seamlessly integrated into a frozen VLM, ensuring both minimal effort and parameter efficiency. Additionally, we introduce a Class-Balanced Pseudo-labeling (CBPL) strategy to mitigate the adverse effects caused by class imbalance. We demonstrate the effectiveness of our method not only in TTA scenarios but also in class generalisation tasks. The source code is available at https://github.com/NUDT-Bai-Group/COLA-TTA

查看原文本刊更多论文

COLA：上下文感知的语言驱动的测试时间适应

测试时间适应（TTA）由于其在解决“分布转移”问题的同时保护数据隐私的有效性而越来越受欢迎。然而，大多数先前的方法都假设共享相同标签空间的成对源域模型和目标域共存，这严重限制了它们的适用性。在本文中，我们研究了一种更通用的源模型，它能够适应多个目标域而不需要共享标签。这是通过使用预训练的视觉语言模型（VLM）来实现的，例如CLIP，它可以通过匹配类描述来识别图像。虽然vlm的零射击性能令人印象深刻，但它们难以有效地捕获目标域的独特属性。为此，我们提出了一种新的方法——上下文感知语言驱动TTA （COLA）。该方法包含一个轻量级的上下文感知模块，该模块由三个关键组件组成：任务感知适配器、上下文感知单元和剩余连接单元，分别用于探索任务特定知识、VLM的领域特定知识和VLM的先验知识。值得注意的是，上下文感知模块可以无缝集成到固定的VLM中，从而确保最小的工作量和参数效率。此外，我们引入了类平衡伪标签（CBPL）策略来减轻类不平衡带来的不利影响。我们证明了我们的方法不仅在TTA场景中有效，而且在类泛化任务中也有效。源代码可从https://github.com/NUDT-Bai-Group/COLA-TTA获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量