AsPrompt: Attribute-structured knowledge-guided dual-modal coupling prompt learning for few-shot image classification

IF 3.4 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-09-13 DOI:10.1016/j.displa.2025.103217

Zhiyong Deng, Ronggui Wang, Lixia Xue, Juan Yang

{"title":"AsPrompt: Attribute-structured knowledge-guided dual-modal coupling prompt learning for few-shot image classification","authors":"Zhiyong Deng, Ronggui Wang, Lixia Xue, Juan Yang","doi":"10.1016/j.displa.2025.103217","DOIUrl":null,"url":null,"abstract":"<div><div>The few-shot image classification task involves classifying images when only a limited number of training images are available. This field has seen significant advancements in recent years due to the development of pre-trained vision-language models (e.g., CLIP), which exhibit strong generalization capabilities. Recent studies have further leveraged classes-related descriptions as part of prompt learning to better adapt these foundational vision-language models for downstream tasks. However, the textual descriptions used in traditional methods often lack sufficient class-discriminative information, limiting the model’s expressiveness on unseen data domains. Given that large language models possess rich structured knowledge bases, they offer new avenues for enhancing textual information. Against this backdrop, we propose a novel method called AsPrompt, which integrates attribute-structured knowledge guidance with a dual-modal coupling prompt learning mechanism. This approach not only enriches class-discriminative textual information but also effectively integrates structured knowledge with traditional textual information by capturing the structured relationships between entity sets and attribute sets. Experimental results demonstrate that AsPrompt surpasses other state-of-the-art prompt learning methods on 11 different few-shot image classification datasets, showcasing its superior performance. The code can be found at <span><span>https://github.com/SandyPrompt/AsPrompt</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103217"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002549","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The few-shot image classification task involves classifying images when only a limited number of training images are available. This field has seen significant advancements in recent years due to the development of pre-trained vision-language models (e.g., CLIP), which exhibit strong generalization capabilities. Recent studies have further leveraged classes-related descriptions as part of prompt learning to better adapt these foundational vision-language models for downstream tasks. However, the textual descriptions used in traditional methods often lack sufficient class-discriminative information, limiting the model’s expressiveness on unseen data domains. Given that large language models possess rich structured knowledge bases, they offer new avenues for enhancing textual information. Against this backdrop, we propose a novel method called AsPrompt, which integrates attribute-structured knowledge guidance with a dual-modal coupling prompt learning mechanism. This approach not only enriches class-discriminative textual information but also effectively integrates structured knowledge with traditional textual information by capturing the structured relationships between entity sets and attribute sets. Experimental results demonstrate that AsPrompt surpasses other state-of-the-art prompt learning methods on 11 different few-shot image classification datasets, showcasing its superior performance. The code can be found at https://github.com/SandyPrompt/AsPrompt.

查看原文本刊更多论文

AsPrompt：基于属性结构化知识引导的双模态耦合提示学习，用于少拍图像分类

少拍图像分类任务涉及在只有有限数量的训练图像可用的情况下对图像进行分类。近年来，由于预训练视觉语言模型（例如CLIP）的发展，该领域取得了重大进展，这些模型表现出强大的泛化能力。最近的研究进一步利用与类相关的描述作为提示学习的一部分，以便更好地将这些基础视觉语言模型用于下游任务。然而，传统方法中使用的文本描述往往缺乏足够的类区分信息，限制了模型在未知数据域上的表达能力。鉴于大型语言模型拥有丰富的结构化知识库，它们为增强文本信息提供了新的途径。在此背景下，我们提出了一种名为AsPrompt的新方法，该方法将属性结构化知识引导与双模态耦合提示学习机制相结合。该方法通过捕获实体集和属性集之间的结构化关系，丰富了类判别文本信息，有效地将结构化知识与传统文本信息相结合。实验结果表明，AsPrompt在11个不同的少量图像分类数据集上优于其他最先进的提示学习方法，显示了其优越的性能。代码可以在https://github.com/SandyPrompt/AsPrompt上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.