Zhihe Lu;Jiawang Bai;Xin Li;Zeyu Xiao;Xinchao Wang
{"title":"Task-to-Instance Prompt Learning for Vision-Language Models at Test Time","authors":"Zhihe Lu;Jiawang Bai;Xin Li;Zeyu Xiao;Xinchao Wang","doi":"10.1109/TIP.2025.3546840","DOIUrl":null,"url":null,"abstract":"Prompt learning has been recently introduced into the adaption of pre-trained vision-language models (VLMs) by tuning a set of trainable tokens to replace hand-crafted text templates. Despite the encouraging results achieved, existing methods largely rely on extra annotated data for training. In this paper, we investigate a more realistic scenario, where only the unlabeled test data is available. Existing test-time prompt learning methods often separately learn a prompt for each test sample. However, relying solely on a single sample heavily limits the performance of the learned prompts, as it neglects the task-level knowledge that can be gained from multiple samples. To that end, we propose a novel test-time prompt learning method of VLMs, called Task-to-Instance PromPt LEarning (TIPPLE), which adopts a two-stage training strategy to leverage both task- and instance-level knowledge. Specifically, we reformulate the effective online pseudo-labeling paradigm along with two tailored components: an auxiliary text classification task and a diversity regularization term, to serve the task-oriented prompt learning. After that, the learned task-level prompt is further combined with a tunable residual for each test sample to integrate with instance-level knowledge. We demonstrate the superior performance of TIPPLE on 15 downstream datasets, e.g., the average improvement of 1.87% over the state-of-the-art method, using ViT-B/16 visual backbone. Our code is open-sourced at <uri>https://github.com/zhiheLu/TIPPLE</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1908-1920"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10925517/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近,提示学习被引入到预训练视觉语言模型(VLM)的调整中,通过调整一组可训练的标记来替代手工制作的文本模板。尽管取得了令人鼓舞的成果,但现有方法在很大程度上依赖于额外的注释数据进行训练。在本文中,我们研究了一种更现实的情况,即只有未标注的测试数据可用。现有的测试时间提示学习方法通常为每个测试样本单独学习一个提示。然而,仅仅依赖单一样本严重限制了所学提示的性能,因为它忽略了从多个样本中获得的任务级知识。为此,我们提出了一种新颖的 VLM 测试时间提示学习方法,称为任务到实例的提示学习(TIPPLE),它采用两阶段训练策略来利用任务和实例级知识。具体来说,我们将有效的在线伪标记范式与两个定制组件(辅助文本分类任务和多样性正则化项)一起重新制定,以服务于面向任务的提示学习。之后,学习到的任务级提示会进一步与每个测试样本的可调残差相结合,以整合实例级知识。我们在 15 个下游数据集上证明了 TIPPLE 的优越性能,例如,使用 ViT-B/16 视觉骨干,TIPPLE 比最先进的方法平均提高了 1.87%。我们的代码开源于 https://github.com/zhiheLu/TIPPLE。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Task-to-Instance Prompt Learning for Vision-Language Models at Test Time
Prompt learning has been recently introduced into the adaption of pre-trained vision-language models (VLMs) by tuning a set of trainable tokens to replace hand-crafted text templates. Despite the encouraging results achieved, existing methods largely rely on extra annotated data for training. In this paper, we investigate a more realistic scenario, where only the unlabeled test data is available. Existing test-time prompt learning methods often separately learn a prompt for each test sample. However, relying solely on a single sample heavily limits the performance of the learned prompts, as it neglects the task-level knowledge that can be gained from multiple samples. To that end, we propose a novel test-time prompt learning method of VLMs, called Task-to-Instance PromPt LEarning (TIPPLE), which adopts a two-stage training strategy to leverage both task- and instance-level knowledge. Specifically, we reformulate the effective online pseudo-labeling paradigm along with two tailored components: an auxiliary text classification task and a diversity regularization term, to serve the task-oriented prompt learning. After that, the learned task-level prompt is further combined with a tunable residual for each test sample to integrate with instance-level knowledge. We demonstrate the superior performance of TIPPLE on 15 downstream datasets, e.g., the average improvement of 1.87% over the state-of-the-art method, using ViT-B/16 visual backbone. Our code is open-sourced at https://github.com/zhiheLu/TIPPLE.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信