COLA: Context-Aware Language-Driven Test-Time Adaptation

IF 13.7
Aiming Zhang;Tianyuan Yu;Liang Bai;Jun Tang;Yanming Guo;Yirun Ruan;Yun Zhou;Zhihe Lu
{"title":"COLA: Context-Aware Language-Driven Test-Time Adaptation","authors":"Aiming Zhang;Tianyuan Yu;Liang Bai;Jun Tang;Yanming Guo;Yirun Ruan;Yun Zhou;Zhihe Lu","doi":"10.1109/TIP.2025.3607634","DOIUrl":null,"url":null,"abstract":"Test-time adaptation (TTA) has gained increasing popularity due to its efficacy in addressing “distribution shift” issue while simultaneously protecting data privacy. However, most prior methods assume that a paired source domain model and target domain sharing the same label space coexist, heavily limiting their applicability. In this paper, we investigate a more general source model capable of adaptation to multiple target domains without needing shared labels. This is achieved by using a pre-trained vision-language model (VLM), e.g., CLIP, that can recognize images through matching with class descriptions. While the zero-shot performance of VLMs is impressive, they struggle to effectively capture the distinctive attributes of a target domain. To that end, we propose a novel method – Context-aware Language-driven TTA (COLA). The proposed method incorporates a lightweight context-aware module that consists of three key components: a task-aware adapter, a context-aware unit, and a residual connection unit for exploring task-specific knowledge, domain-specific knowledge from the VLM and prior knowledge of the VLM, respectively. It is worth noting that the context-aware module can be seamlessly integrated into a frozen VLM, ensuring both minimal effort and parameter efficiency. Additionally, we introduce a Class-Balanced Pseudo-labeling (CBPL) strategy to mitigate the adverse effects caused by class imbalance. We demonstrate the effectiveness of our method not only in TTA scenarios but also in class generalisation tasks. The source code is available at <uri>https://github.com/NUDT-Bai-Group/COLA-TTA</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6002-6015"},"PeriodicalIF":13.7000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11174099/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Test-time adaptation (TTA) has gained increasing popularity due to its efficacy in addressing “distribution shift” issue while simultaneously protecting data privacy. However, most prior methods assume that a paired source domain model and target domain sharing the same label space coexist, heavily limiting their applicability. In this paper, we investigate a more general source model capable of adaptation to multiple target domains without needing shared labels. This is achieved by using a pre-trained vision-language model (VLM), e.g., CLIP, that can recognize images through matching with class descriptions. While the zero-shot performance of VLMs is impressive, they struggle to effectively capture the distinctive attributes of a target domain. To that end, we propose a novel method – Context-aware Language-driven TTA (COLA). The proposed method incorporates a lightweight context-aware module that consists of three key components: a task-aware adapter, a context-aware unit, and a residual connection unit for exploring task-specific knowledge, domain-specific knowledge from the VLM and prior knowledge of the VLM, respectively. It is worth noting that the context-aware module can be seamlessly integrated into a frozen VLM, ensuring both minimal effort and parameter efficiency. Additionally, we introduce a Class-Balanced Pseudo-labeling (CBPL) strategy to mitigate the adverse effects caused by class imbalance. We demonstrate the effectiveness of our method not only in TTA scenarios but also in class generalisation tasks. The source code is available at https://github.com/NUDT-Bai-Group/COLA-TTA
COLA:上下文感知的语言驱动的测试时间适应
测试时间适应(TTA)由于其在解决“分布转移”问题的同时保护数据隐私的有效性而越来越受欢迎。然而,大多数先前的方法都假设共享相同标签空间的成对源域模型和目标域共存,这严重限制了它们的适用性。在本文中,我们研究了一种更通用的源模型,它能够适应多个目标域而不需要共享标签。这是通过使用预训练的视觉语言模型(VLM)来实现的,例如CLIP,它可以通过匹配类描述来识别图像。虽然vlm的零射击性能令人印象深刻,但它们难以有效地捕获目标域的独特属性。为此,我们提出了一种新的方法——上下文感知语言驱动TTA (COLA)。该方法包含一个轻量级的上下文感知模块,该模块由三个关键组件组成:任务感知适配器、上下文感知单元和剩余连接单元,分别用于探索任务特定知识、VLM的领域特定知识和VLM的先验知识。值得注意的是,上下文感知模块可以无缝集成到固定的VLM中,从而确保最小的工作量和参数效率。此外,我们引入了类平衡伪标签(CBPL)策略来减轻类不平衡带来的不利影响。我们证明了我们的方法不仅在TTA场景中有效,而且在类泛化任务中也有效。源代码可从https://github.com/NUDT-Bai-Group/COLA-TTA获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信