GPSP-CLIP: learning generic pseudo-state prompts for flexible zero-shot anomaly detection

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2025-10-07 DOI:10.1007/s10489-025-06843-1

Weiyu Hu, Shubo Zhou, Yongbin Gao, Xue-Qin Jiang

{"title":"GPSP-CLIP: learning generic pseudo-state prompts for flexible zero-shot anomaly detection","authors":"Weiyu Hu, Shubo Zhou, Yongbin Gao, Xue-Qin Jiang","doi":"10.1007/s10489-025-06843-1","DOIUrl":null,"url":null,"abstract":"<div>Large-scale foundation models such as Contrastive Language-Image Pre-training (CLIP) have shown great potential in zero-shot anomaly detection (ZSAD) task, allowing a single model to generalize to unseen categories without fine-tuning on specific classes. However, existing ZSAD methods often rely on rigid prompt designs, which makes them difficult to adapt to the diverse characteristics of industrial products. Additionally, the need to manually define category-specific and state-specific prompts limits their scalability and generalization. This paper proposes a generic pseudo-state prompting model based on CLIP (GPSP-CLIP) to address these challenges. The motivation behind GPSP-CLIP is to develop a flexible prompting method capable of representing normal and anomalous conditions across various applications without relying on predefined text prompts. Technically, GPSP-CLIP employs fully learnable parameters to generate broad, pseudo-state text features, enabling generalization across different industrial contexts. By employing distinct prompt learning strategies for anomaly classification and segmentation, GPSP-CLIP optimizes each task independently. This enables the model to effectively capture high-level semantics through global prompts while identifying fine-grained defect patterns via local prompts. Experimental results on the well-known MVTec and VisA datasets demonstrate improved performance, with a 1.8% improvement in AP for anomaly classification and a 1.3% gain in AUPRO for anomaly segmentation compared to state-of-the-art methods.</div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06843-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Large-scale foundation models such as Contrastive Language-Image Pre-training (CLIP) have shown great potential in zero-shot anomaly detection (ZSAD) task, allowing a single model to generalize to unseen categories without fine-tuning on specific classes. However, existing ZSAD methods often rely on rigid prompt designs, which makes them difficult to adapt to the diverse characteristics of industrial products. Additionally, the need to manually define category-specific and state-specific prompts limits their scalability and generalization. This paper proposes a generic pseudo-state prompting model based on CLIP (GPSP-CLIP) to address these challenges. The motivation behind GPSP-CLIP is to develop a flexible prompting method capable of representing normal and anomalous conditions across various applications without relying on predefined text prompts. Technically, GPSP-CLIP employs fully learnable parameters to generate broad, pseudo-state text features, enabling generalization across different industrial contexts. By employing distinct prompt learning strategies for anomaly classification and segmentation, GPSP-CLIP optimizes each task independently. This enables the model to effectively capture high-level semantics through global prompts while identifying fine-grained defect patterns via local prompts. Experimental results on the well-known MVTec and VisA datasets demonstrate improved performance, with a 1.8% improvement in AP for anomaly classification and a 1.3% gain in AUPRO for anomaly segmentation compared to state-of-the-art methods.

查看原文本刊更多论文

GPSP-CLIP：学习通用伪状态提示，用于灵活的零射击异常检测

大规模的基础模型，如对比语言图像预训练（CLIP）在零射击异常检测（ZSAD）任务中显示出巨大的潜力，允许单个模型推广到未见过的类别，而无需对特定类别进行微调。然而，现有的ZSAD方法往往依赖于刚性的提示设计，这使得它们难以适应工业产品的多样化特征。此外，需要手动定义特定类别和特定状态的提示限制了它们的可伸缩性和泛化。本文提出了一种基于CLIP的通用伪状态提示模型（gsp -CLIP）来解决这些问题。gsp - clip背后的动机是开发一种灵活的提示方法，能够在各种应用程序中表示正常和异常情况，而不依赖于预定义的文本提示。从技术上讲，gsp - clip使用完全可学习的参数来生成广泛的伪状态文本特征，从而实现跨不同工业环境的泛化。GPSP-CLIP通过采用不同的提示学习策略进行异常分类和分割，独立地优化每个任务。这使模型能够通过全局提示有效地捕获高级语义，同时通过局部提示识别细粒度缺陷模式。在著名的MVTec和VisA数据集上的实验结果表明，与最先进的方法相比，该方法在异常分类方面的AP提高了1.8%，在异常分割方面的AUPRO提高了1.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.