{"title":"CoAPT: Context Attribute words for Prompt Tuning","authors":"Gun Lee , Subin An , Sungyong Baik , Soochahn Lee","doi":"10.1016/j.knosys.2025.113653","DOIUrl":null,"url":null,"abstract":"<div><div>We propose a novel prompt tuning method called <em>CoAPT (Context Attribute words in Prompt Tuning)</em> for few/zero-shot image classification. The core motivation is that attributes are descriptive words with rich information about a given concept. Thus, we aim to enrich text queries of existing prompt tuning methods, improving alignment between text and image embeddings in CLIP embedding space. To do so, <em>CoAPT</em> integrates attribute words as additional prompts within learnable prompt tuning and can be easily incorporated into various existing prompt tuning methods. To facilitate the incorporation of attributes into text embeddings and the alignment with image embeddings, soft prompts are trained together with an additional meta-network that generates input-image-wise feature biases from the concatenated feature encodings of the image–text combined queries. Our experiments demonstrate that <em>CoAPT</em> leads to considerable improvements for existing baseline methods on several few/zero-shot image classification tasks, including base-to-novel generalization, cross-dataset transfer, and domain generalization. Our findings highlight the importance of combining hard and soft prompts and pave the way for future research on the interplay between text and image latent spaces in pre-trained models.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"320 ","pages":"Article 113653"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125006999","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a novel prompt tuning method called CoAPT (Context Attribute words in Prompt Tuning) for few/zero-shot image classification. The core motivation is that attributes are descriptive words with rich information about a given concept. Thus, we aim to enrich text queries of existing prompt tuning methods, improving alignment between text and image embeddings in CLIP embedding space. To do so, CoAPT integrates attribute words as additional prompts within learnable prompt tuning and can be easily incorporated into various existing prompt tuning methods. To facilitate the incorporation of attributes into text embeddings and the alignment with image embeddings, soft prompts are trained together with an additional meta-network that generates input-image-wise feature biases from the concatenated feature encodings of the image–text combined queries. Our experiments demonstrate that CoAPT leads to considerable improvements for existing baseline methods on several few/zero-shot image classification tasks, including base-to-novel generalization, cross-dataset transfer, and domain generalization. Our findings highlight the importance of combining hard and soft prompts and pave the way for future research on the interplay between text and image latent spaces in pre-trained models.
我们提出了一种新的提示调优方法CoAPT (Context Attribute words in prompt tuning),用于少/零镜头图像分类。核心动机是,属性是具有关于给定概念的丰富信息的描述性词汇。因此,我们的目标是丰富现有提示调优方法的文本查询,改善CLIP嵌入空间中文本和图像嵌入之间的对齐。为此,CoAPT将属性词集成为可学习的提示调优中的附加提示,并且可以轻松地合并到各种现有的提示调优方法中。为了便于将属性合并到文本嵌入中并与图像嵌入对齐,软提示与一个额外的元网络一起进行训练,该元网络从图像-文本组合查询的连接特征编码中生成输入图像方面的特征偏差。我们的实验表明,CoAPT在几个少数/零拍摄图像分类任务上对现有基线方法进行了相当大的改进,包括基础到新图像的概化、跨数据集传输和域概化。我们的研究结果强调了硬提示和软提示结合的重要性,并为未来研究预训练模型中文本和图像潜在空间之间的相互作用铺平了道路。
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.