{"title":"AsPrompt: Attribute-structured knowledge-guided dual-modal coupling prompt learning for few-shot image classification","authors":"Zhiyong Deng, Ronggui Wang, Lixia Xue, Juan Yang","doi":"10.1016/j.displa.2025.103217","DOIUrl":null,"url":null,"abstract":"<div><div>The few-shot image classification task involves classifying images when only a limited number of training images are available. This field has seen significant advancements in recent years due to the development of pre-trained vision-language models (e.g., CLIP), which exhibit strong generalization capabilities. Recent studies have further leveraged classes-related descriptions as part of prompt learning to better adapt these foundational vision-language models for downstream tasks. However, the textual descriptions used in traditional methods often lack sufficient class-discriminative information, limiting the model’s expressiveness on unseen data domains. Given that large language models possess rich structured knowledge bases, they offer new avenues for enhancing textual information. Against this backdrop, we propose a novel method called AsPrompt, which integrates attribute-structured knowledge guidance with a dual-modal coupling prompt learning mechanism. This approach not only enriches class-discriminative textual information but also effectively integrates structured knowledge with traditional textual information by capturing the structured relationships between entity sets and attribute sets. Experimental results demonstrate that AsPrompt surpasses other state-of-the-art prompt learning methods on 11 different few-shot image classification datasets, showcasing its superior performance. The code can be found at <span><span>https://github.com/SandyPrompt/AsPrompt</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103217"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002549","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The few-shot image classification task involves classifying images when only a limited number of training images are available. This field has seen significant advancements in recent years due to the development of pre-trained vision-language models (e.g., CLIP), which exhibit strong generalization capabilities. Recent studies have further leveraged classes-related descriptions as part of prompt learning to better adapt these foundational vision-language models for downstream tasks. However, the textual descriptions used in traditional methods often lack sufficient class-discriminative information, limiting the model’s expressiveness on unseen data domains. Given that large language models possess rich structured knowledge bases, they offer new avenues for enhancing textual information. Against this backdrop, we propose a novel method called AsPrompt, which integrates attribute-structured knowledge guidance with a dual-modal coupling prompt learning mechanism. This approach not only enriches class-discriminative textual information but also effectively integrates structured knowledge with traditional textual information by capturing the structured relationships between entity sets and attribute sets. Experimental results demonstrate that AsPrompt surpasses other state-of-the-art prompt learning methods on 11 different few-shot image classification datasets, showcasing its superior performance. The code can be found at https://github.com/SandyPrompt/AsPrompt.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.