Group-guided prompt learning for vision-language models

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-09-24 DOI:10.1016/j.eswa.2025.129846

Yufei Zheng , Shengsheng Wang , Yansheng Gao

{"title":"Group-guided prompt learning for vision-language models","authors":"Yufei Zheng , Shengsheng Wang , Yansheng Gao","doi":"10.1016/j.eswa.2025.129846","DOIUrl":null,"url":null,"abstract":"<div><div>Prompt learning has become one of the mainstream approaches for enabling Vision-Language Models (VLMs) to effectively adapt to downstream tasks. Recent approaches enhanced the generalization of models by integrating prior knowledge from large language models (LLMs). However, these approaches overlook the potential value of group knowledge derived from semantic correlations across different classes, which may limit the performance of the model in the face of complex downstream tasks. To overcome this challenge, we propose <strong>Group-guided Prompt Learning (GGPL)</strong>, which integrates group knowledge into the original text prompts through LLMs. Specifically, GGPL uses LLMs to group all classes and integrates the group knowledge into the original text prompts to construct the final text prompts. Furthermore, we introduce a novel <strong>Group Knowledge Alignment (GKA)</strong> module, which aligns the learnable prompt features with the pre-trained features that contain group knowledge, preventing the learnable prompt features from feature shift during the training process and thus reducing overfitting. Experimental results across 11 public datasets demonstrate that the proposed GGPL method achieves significant improvement on various prompt learning approaches, while numerous ablation experiments also demonstrate the effectiveness of the each component of our GGPL method.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129846"},"PeriodicalIF":7.5000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742503461X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Prompt learning has become one of the mainstream approaches for enabling Vision-Language Models (VLMs) to effectively adapt to downstream tasks. Recent approaches enhanced the generalization of models by integrating prior knowledge from large language models (LLMs). However, these approaches overlook the potential value of group knowledge derived from semantic correlations across different classes, which may limit the performance of the model in the face of complex downstream tasks. To overcome this challenge, we propose Group-guided Prompt Learning (GGPL), which integrates group knowledge into the original text prompts through LLMs. Specifically, GGPL uses LLMs to group all classes and integrates the group knowledge into the original text prompts to construct the final text prompts. Furthermore, we introduce a novel Group Knowledge Alignment (GKA) module, which aligns the learnable prompt features with the pre-trained features that contain group knowledge, preventing the learnable prompt features from feature shift during the training process and thus reducing overfitting. Experimental results across 11 public datasets demonstrate that the proposed GGPL method achieves significant improvement on various prompt learning approaches, while numerous ablation experiments also demonstrate the effectiveness of the each component of our GGPL method.

查看原文本刊更多论文

视觉语言模型的群体引导提示学习

快速学习已经成为使视觉语言模型（vlm）有效适应下游任务的主流方法之一。最近的方法通过整合来自大型语言模型（llm）的先验知识来增强模型的泛化。然而，这些方法忽略了从不同类之间的语义关联中获得的群体知识的潜在价值，这可能会限制模型在面对复杂的下游任务时的性能。为了克服这一挑战，我们提出了小组引导提示学习（GGPL），它通过llm将小组知识整合到原始文本提示中。具体而言，GGPL使用llm对所有类进行分组，并将分组知识集成到原始文本提示中，以构建最终的文本提示。此外，我们引入了一种新的组知识对齐（Group Knowledge Alignment， GKA）模块，该模块将可学习提示特征与包含组知识的预训练特征对齐，防止可学习提示特征在训练过程中发生特征移位，从而减少过拟合。在11个公共数据集上的实验结果表明，所提出的GGPL方法在各种提示学习方法上取得了显著的改进，而大量的消融实验也证明了我们的GGPL方法的每个组成部分的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.