Dual Adapter Tuning of Vision-Language Models Using Large Language Models.

IF 2.9 4区 计算机科学
Mohammad Reza Zarei, Abbas Akkasi, Majid Komeili
{"title":"Dual Adapter Tuning of Vision-Language Models Using Large Language Models.","authors":"Mohammad Reza Zarei, Abbas Akkasi, Majid Komeili","doi":"10.1007/s44196-025-00853-0","DOIUrl":null,"url":null,"abstract":"<p><p>Vision-language models (VLMs) pre-trained on large-scale image-text pairs have shown impressive results in zero-shot vision tasks. Knowledge transferability of these models can be further improved with the help of a limited number of samples. Feature adapter tuning is a prominent approach employed for efficient transfer learning (ETL). However, most of the previous ETL models focus on tuning either prior-independent or prior-dependent feature adapters. We propose a novel ETL approach that leverages both adapter styles simultaneously. Additionally, most existing ETL models rely on using textual prompts constructed by completing general pre-defined templates. This approach neglects the descriptive knowledge that can assist VLM by presenting an informative prompt. Instead of pre-defined templates for prompt construction, we use a pre-trained LLM to generate attribute-specific prompts for each visual category. Furthermore, we guide the VLM with context-aware discriminative information generated by the pre-trained LLM to emphasize features that distinguish the most probable candidate classes. The proposed ETL model is evaluated on 11 datasets and sets a new state of the art. Our code and all collected prompts are publicly available at https://github.com/mrzarei5/DATViL.</p>","PeriodicalId":54967,"journal":{"name":"International Journal of Computational Intelligence Systems","volume":"18 1","pages":"109"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077310/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computational Intelligence Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s44196-025-00853-0","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/8 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Vision-language models (VLMs) pre-trained on large-scale image-text pairs have shown impressive results in zero-shot vision tasks. Knowledge transferability of these models can be further improved with the help of a limited number of samples. Feature adapter tuning is a prominent approach employed for efficient transfer learning (ETL). However, most of the previous ETL models focus on tuning either prior-independent or prior-dependent feature adapters. We propose a novel ETL approach that leverages both adapter styles simultaneously. Additionally, most existing ETL models rely on using textual prompts constructed by completing general pre-defined templates. This approach neglects the descriptive knowledge that can assist VLM by presenting an informative prompt. Instead of pre-defined templates for prompt construction, we use a pre-trained LLM to generate attribute-specific prompts for each visual category. Furthermore, we guide the VLM with context-aware discriminative information generated by the pre-trained LLM to emphasize features that distinguish the most probable candidate classes. The proposed ETL model is evaluated on 11 datasets and sets a new state of the art. Our code and all collected prompts are publicly available at https://github.com/mrzarei5/DATViL.

使用大型语言模型的视觉语言模型的双适配器调优。
在大规模图像-文本对上进行预训练的视觉语言模型(VLMs)在零射击视觉任务中显示出令人印象深刻的结果。在有限样本的帮助下,这些模型的知识可转移性可以进一步提高。特征适配器调优是实现高效迁移学习(ETL)的重要方法。但是,以前的大多数ETL模型都侧重于调优与先验无关或依赖于先验的特性适配器。我们提出了一种新颖的ETL方法,它同时利用了这两种适配器样式。此外,大多数现有的ETL模型依赖于使用通过完成一般预定义模板构造的文本提示。这种方法忽略了可以通过提供信息提示来帮助VLM的描述性知识。我们使用预训练的LLM来为每个视觉类别生成特定属性的提示,而不是用于提示构建的预定义模板。此外,我们使用预训练的LLM生成的上下文感知判别信息来指导VLM,以强调区分最可能候选类的特征。提出的ETL模型在11个数据集上进行了评估,并设定了一个新的艺术状态。我们的代码和所有收集到的提示都可以在https://github.com/mrzarei5/DATViL上公开获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Computational Intelligence Systems
International Journal of Computational Intelligence Systems 工程技术-计算机:跨学科应用
自引率
3.40%
发文量
94
期刊介绍: The International Journal of Computational Intelligence Systems publishes original research on all aspects of applied computational intelligence, especially targeting papers demonstrating the use of techniques and methods originating from computational intelligence theory. The core theories of computational intelligence are fuzzy logic, neural networks, evolutionary computation and probabilistic reasoning. The journal publishes only articles related to the use of computational intelligence and broadly covers the following topics: -Autonomous reasoning- Bio-informatics- Cloud computing- Condition monitoring- Data science- Data mining- Data visualization- Decision support systems- Fault diagnosis- Intelligent information retrieval- Human-machine interaction and interfaces- Image processing- Internet and networks- Noise analysis- Pattern recognition- Prediction systems- Power (nuclear) safety systems- Process and system control- Real-time systems- Risk analysis and safety-related issues- Robotics- Signal and image processing- IoT and smart environments- Systems integration- System control- System modelling and optimization- Telecommunications- Time series prediction- Warning systems- Virtual reality- Web intelligence- Deep learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信