Ze Gao , Jing Guo , Liming Chen , Kai Wang , Yang Chen , Yongzhen Ke , Shuai Yang
{"title":"AnDR-BLIP2: enhanced semantic understanding framework for industrial image anomaly detection and report generation","authors":"Ze Gao , Jing Guo , Liming Chen , Kai Wang , Yang Chen , Yongzhen Ke , Shuai Yang","doi":"10.1016/j.jfranklin.2025.107816","DOIUrl":null,"url":null,"abstract":"<div><div>Nowadays, the rapid development of Large Multimodal Models (LMM) has demonstrated its powerful ability in image understanding. However, when applied to downstream tasks such as industrial anomaly detection, it often lacks competence due to limitations in image parsing ability, pre-training data, and training strategy. Specifically, it struggles with understanding the detailed semantics of abnormal parts of images. As LLM performance continues to improve, the Industrial Image Anomaly Detection Report Generation (IADRG) task may emerge as a new challenge in the future. In this paper, we define the IADRG task as a deeper image understanding task and propose a solution for it. We propose AnDR-BLIP2, a dual-branch multi-modal large model based on the BLIP2 model combined with the SAM visual understanding branch to enhance detailed feature extraction from images. Additionally, we utilize mixed semantic pre-training of general and industrial image data to strengthen the model's ability to understand abnormal content in industrial anomaly detection tasks. Furthermore, our model leverages SAM's pixel-level feature parsing ability to integrate a prompt zero-shot industrial anomaly segmentation method into report generation. Experimental results on Mvtec-AD and VisA datasets demonstrate that our model accurately understands industrial image anomalies and achieves considerable performance in zero-shot anomaly segmentation.</div></div>","PeriodicalId":17283,"journal":{"name":"Journal of The Franklin Institute-engineering and Applied Mathematics","volume":"362 12","pages":"Article 107816"},"PeriodicalIF":4.2000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of The Franklin Institute-engineering and Applied Mathematics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016003225003096","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, the rapid development of Large Multimodal Models (LMM) has demonstrated its powerful ability in image understanding. However, when applied to downstream tasks such as industrial anomaly detection, it often lacks competence due to limitations in image parsing ability, pre-training data, and training strategy. Specifically, it struggles with understanding the detailed semantics of abnormal parts of images. As LLM performance continues to improve, the Industrial Image Anomaly Detection Report Generation (IADRG) task may emerge as a new challenge in the future. In this paper, we define the IADRG task as a deeper image understanding task and propose a solution for it. We propose AnDR-BLIP2, a dual-branch multi-modal large model based on the BLIP2 model combined with the SAM visual understanding branch to enhance detailed feature extraction from images. Additionally, we utilize mixed semantic pre-training of general and industrial image data to strengthen the model's ability to understand abnormal content in industrial anomaly detection tasks. Furthermore, our model leverages SAM's pixel-level feature parsing ability to integrate a prompt zero-shot industrial anomaly segmentation method into report generation. Experimental results on Mvtec-AD and VisA datasets demonstrate that our model accurately understands industrial image anomalies and achieves considerable performance in zero-shot anomaly segmentation.
期刊介绍:
The Journal of The Franklin Institute has an established reputation for publishing high-quality papers in the field of engineering and applied mathematics. Its current focus is on control systems, complex networks and dynamic systems, signal processing and communications and their applications. All submitted papers are peer-reviewed. The Journal will publish original research papers and research review papers of substance. Papers and special focus issues are judged upon possible lasting value, which has been and continues to be the strength of the Journal of The Franklin Institute.