Group-guided prompt learning for vision-language models

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yufei Zheng , Shengsheng Wang , Yansheng Gao
{"title":"Group-guided prompt learning for vision-language models","authors":"Yufei Zheng ,&nbsp;Shengsheng Wang ,&nbsp;Yansheng Gao","doi":"10.1016/j.eswa.2025.129846","DOIUrl":null,"url":null,"abstract":"<div><div>Prompt learning has become one of the mainstream approaches for enabling Vision-Language Models (VLMs) to effectively adapt to downstream tasks. Recent approaches enhanced the generalization of models by integrating prior knowledge from large language models (LLMs). However, these approaches overlook the potential value of group knowledge derived from semantic correlations across different classes, which may limit the performance of the model in the face of complex downstream tasks. To overcome this challenge, we propose <strong>Group-guided Prompt Learning (GGPL)</strong>, which integrates group knowledge into the original text prompts through LLMs. Specifically, GGPL uses LLMs to group all classes and integrates the group knowledge into the original text prompts to construct the final text prompts. Furthermore, we introduce a novel <strong>Group Knowledge Alignment (GKA)</strong> module, which aligns the learnable prompt features with the pre-trained features that contain group knowledge, preventing the learnable prompt features from feature shift during the training process and thus reducing overfitting. Experimental results across 11 public datasets demonstrate that the proposed GGPL method achieves significant improvement on various prompt learning approaches, while numerous ablation experiments also demonstrate the effectiveness of the each component of our GGPL method.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129846"},"PeriodicalIF":7.5000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742503461X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Prompt learning has become one of the mainstream approaches for enabling Vision-Language Models (VLMs) to effectively adapt to downstream tasks. Recent approaches enhanced the generalization of models by integrating prior knowledge from large language models (LLMs). However, these approaches overlook the potential value of group knowledge derived from semantic correlations across different classes, which may limit the performance of the model in the face of complex downstream tasks. To overcome this challenge, we propose Group-guided Prompt Learning (GGPL), which integrates group knowledge into the original text prompts through LLMs. Specifically, GGPL uses LLMs to group all classes and integrates the group knowledge into the original text prompts to construct the final text prompts. Furthermore, we introduce a novel Group Knowledge Alignment (GKA) module, which aligns the learnable prompt features with the pre-trained features that contain group knowledge, preventing the learnable prompt features from feature shift during the training process and thus reducing overfitting. Experimental results across 11 public datasets demonstrate that the proposed GGPL method achieves significant improvement on various prompt learning approaches, while numerous ablation experiments also demonstrate the effectiveness of the each component of our GGPL method.
视觉语言模型的群体引导提示学习
快速学习已经成为使视觉语言模型(vlm)有效适应下游任务的主流方法之一。最近的方法通过整合来自大型语言模型(llm)的先验知识来增强模型的泛化。然而,这些方法忽略了从不同类之间的语义关联中获得的群体知识的潜在价值,这可能会限制模型在面对复杂的下游任务时的性能。为了克服这一挑战,我们提出了小组引导提示学习(GGPL),它通过llm将小组知识整合到原始文本提示中。具体而言,GGPL使用llm对所有类进行分组,并将分组知识集成到原始文本提示中,以构建最终的文本提示。此外,我们引入了一种新的组知识对齐(Group Knowledge Alignment, GKA)模块,该模块将可学习提示特征与包含组知识的预训练特征对齐,防止可学习提示特征在训练过程中发生特征移位,从而减少过拟合。在11个公共数据集上的实验结果表明,所提出的GGPL方法在各种提示学习方法上取得了显著的改进,而大量的消融实验也证明了我们的GGPL方法的每个组成部分的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信