{"title":"GPSP-CLIP: learning generic pseudo-state prompts for flexible zero-shot anomaly detection","authors":"Weiyu Hu, Shubo Zhou, Yongbin Gao, Xue-Qin Jiang","doi":"10.1007/s10489-025-06843-1","DOIUrl":null,"url":null,"abstract":"<div><p>Large-scale foundation models such as Contrastive Language-Image Pre-training (CLIP) have shown great potential in zero-shot anomaly detection (ZSAD) task, allowing a single model to generalize to unseen categories without fine-tuning on specific classes. However, existing ZSAD methods often rely on rigid prompt designs, which makes them difficult to adapt to the diverse characteristics of industrial products. Additionally, the need to manually define category-specific and state-specific prompts limits their scalability and generalization. This paper proposes a generic pseudo-state prompting model based on CLIP (<i>GPSP-CLIP</i>) to address these challenges. The motivation behind <i>GPSP-CLIP</i> is to develop a flexible prompting method capable of representing normal and anomalous conditions across various applications without relying on predefined text prompts. Technically, <i>GPSP-CLIP</i> employs fully learnable parameters to generate broad, pseudo-state text features, enabling generalization across different industrial contexts. By employing distinct prompt learning strategies for anomaly classification and segmentation, <i>GPSP-CLIP</i> optimizes each task independently. This enables the model to effectively capture high-level semantics through global prompts while identifying fine-grained defect patterns via local prompts. Experimental results on the well-known MVTec and VisA datasets demonstrate improved performance, with a 1.8% improvement in AP for anomaly classification and a 1.3% gain in AUPRO for anomaly segmentation compared to state-of-the-art methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06843-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Large-scale foundation models such as Contrastive Language-Image Pre-training (CLIP) have shown great potential in zero-shot anomaly detection (ZSAD) task, allowing a single model to generalize to unseen categories without fine-tuning on specific classes. However, existing ZSAD methods often rely on rigid prompt designs, which makes them difficult to adapt to the diverse characteristics of industrial products. Additionally, the need to manually define category-specific and state-specific prompts limits their scalability and generalization. This paper proposes a generic pseudo-state prompting model based on CLIP (GPSP-CLIP) to address these challenges. The motivation behind GPSP-CLIP is to develop a flexible prompting method capable of representing normal and anomalous conditions across various applications without relying on predefined text prompts. Technically, GPSP-CLIP employs fully learnable parameters to generate broad, pseudo-state text features, enabling generalization across different industrial contexts. By employing distinct prompt learning strategies for anomaly classification and segmentation, GPSP-CLIP optimizes each task independently. This enables the model to effectively capture high-level semantics through global prompts while identifying fine-grained defect patterns via local prompts. Experimental results on the well-known MVTec and VisA datasets demonstrate improved performance, with a 1.8% improvement in AP for anomaly classification and a 1.3% gain in AUPRO for anomaly segmentation compared to state-of-the-art methods.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.