{"title":"Personalizing Vision-Language Models With Hybrid Prompts for Zero-Shot Anomaly Detection","authors":"Yunkang Cao;Xiaohao Xu;Yuqi Cheng;Chen Sun;Zongwei Du;Liang Gao;Weiming Shen","doi":"10.1109/TCYB.2025.3536165","DOIUrl":null,"url":null,"abstract":"Zero-shot anomaly detection (ZSAD) aims to develop a foundational model capable of detecting anomalies across arbitrary categories without relying on reference images. However, since “abnormality” is inherently defined in relation to “normality” within specific categories, detecting anomalies without reference images describing the corresponding normal context remains a significant challenge. As an alternative to reference images, this study explores the use of widely available product standards to characterize normal contexts and potential abnormal states. Specifically, this study introduces AnomalyVLM, which leverages generalized pretrained vision-language models (VLMs) to interpret these standards and detect anomalies. Given the current limitations of VLMs in comprehending complex textual information, AnomalyVLM generates hybrid prompts—comprising prompts for abnormal regions, symbolic rules, and region numbers—from the standards to facilitate more effective understanding. These hybrid prompts are incorporated into various stages of the anomaly detection process within the selected VLMs, including an anomaly region generator and an anomaly region refiner. By utilizing hybrid prompts, VLMs are personalized as anomaly detectors for specific categories, offering users flexibility and control in detecting anomalies across novel categories without the need for training data. Experimental results on four public industrial anomaly detection datasets, as well as a practical automotive part inspection task, highlight the superior performance and enhanced generalization capability of AnomalyVLM, especially in texture categories. An online demo of AnomalyVLM is available at <uri>https://github.com/caoyunkang/Segment-Any-Anomaly</uri>.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 4","pages":"1917-1929"},"PeriodicalIF":9.4000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10884560/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Zero-shot anomaly detection (ZSAD) aims to develop a foundational model capable of detecting anomalies across arbitrary categories without relying on reference images. However, since “abnormality” is inherently defined in relation to “normality” within specific categories, detecting anomalies without reference images describing the corresponding normal context remains a significant challenge. As an alternative to reference images, this study explores the use of widely available product standards to characterize normal contexts and potential abnormal states. Specifically, this study introduces AnomalyVLM, which leverages generalized pretrained vision-language models (VLMs) to interpret these standards and detect anomalies. Given the current limitations of VLMs in comprehending complex textual information, AnomalyVLM generates hybrid prompts—comprising prompts for abnormal regions, symbolic rules, and region numbers—from the standards to facilitate more effective understanding. These hybrid prompts are incorporated into various stages of the anomaly detection process within the selected VLMs, including an anomaly region generator and an anomaly region refiner. By utilizing hybrid prompts, VLMs are personalized as anomaly detectors for specific categories, offering users flexibility and control in detecting anomalies across novel categories without the need for training data. Experimental results on four public industrial anomaly detection datasets, as well as a practical automotive part inspection task, highlight the superior performance and enhanced generalization capability of AnomalyVLM, especially in texture categories. An online demo of AnomalyVLM is available at https://github.com/caoyunkang/Segment-Any-Anomaly.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.