Hanzhi Chen , Jingbin Que , Kexin Zhu , Zhide Chen , Fei Zhu , Wencheng Yang , Xu Yang , Xuechao Yang
{"title":"MALM-CLIP: A generative multi-agent framework for multimodal fusion in few-shot industrial anomaly detection","authors":"Hanzhi Chen , Jingbin Que , Kexin Zhu , Zhide Chen , Fei Zhu , Wencheng Yang , Xu Yang , Xuechao Yang","doi":"10.1016/j.inffus.2025.103765","DOIUrl":null,"url":null,"abstract":"<div><div>The Contrastive Language-Image Pre-training (CLIP) model has significantly improved few-shot industrial anomaly detection. However, existing approaches often rely on manually crafted visual description texts, which lack robustness and generalizability in real-world production settings. This limitation is evident as these methods struggle to adapt to new or evolving anomalies, where original prompts fail to generalize beyond their initial design. This paper proposes a novel method, Multi-agent Language Models with CLIP (MALM-CLIP), which integrates the generative capabilities of large language models (LLMs) with CLIP within a multi-agent framework. In this system, specialized agents handle different subtasks such as prompt generation and model evaluation, enabling automated and context-aware multimodal information fusion. By eliminating manual prompt engineering, MALM-CLIP enhances both the accuracy and efficiency of anomaly detection. Experimental results on standard datasets such as MVTec and VisA demonstrate that our approach outperforms existing methods in detecting image-level anomalies with minimal training data. This work highlights the potential of combining Generative Artificial Intelligence (GenAI) and multi-agent systems for robust few-shot industrial anomaly detection.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103765"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008279","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The Contrastive Language-Image Pre-training (CLIP) model has significantly improved few-shot industrial anomaly detection. However, existing approaches often rely on manually crafted visual description texts, which lack robustness and generalizability in real-world production settings. This limitation is evident as these methods struggle to adapt to new or evolving anomalies, where original prompts fail to generalize beyond their initial design. This paper proposes a novel method, Multi-agent Language Models with CLIP (MALM-CLIP), which integrates the generative capabilities of large language models (LLMs) with CLIP within a multi-agent framework. In this system, specialized agents handle different subtasks such as prompt generation and model evaluation, enabling automated and context-aware multimodal information fusion. By eliminating manual prompt engineering, MALM-CLIP enhances both the accuracy and efficiency of anomaly detection. Experimental results on standard datasets such as MVTec and VisA demonstrate that our approach outperforms existing methods in detecting image-level anomalies with minimal training data. This work highlights the potential of combining Generative Artificial Intelligence (GenAI) and multi-agent systems for robust few-shot industrial anomaly detection.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.