WamGLM: A multimodal large-scale language model for wafer map defect information in-depth query through multi-turn dialogue based on prototypical supervised contrastive learning
IF 6.6 1区 计算机科学Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shulong Gu , Zihao Lei , Guangrui Wen , Quanning Xu , Zhaojun Steven Li , Xuefeng Chen , Chunsheng Yang
{"title":"WamGLM: A multimodal large-scale language model for wafer map defect information in-depth query through multi-turn dialogue based on prototypical supervised contrastive learning","authors":"Shulong Gu , Zihao Lei , Guangrui Wen , Quanning Xu , Zhaojun Steven Li , Xuefeng Chen , Chunsheng Yang","doi":"10.1016/j.asoc.2025.113962","DOIUrl":null,"url":null,"abstract":"<div><div>To ensure production efficiency and process stability in semiconductor manufacturing, it is of critical importance to detect wafer map defects and perform information query for tracing and solving problems during the manufacturing process. Numerous vision models based on deep learning have been successfully applied to wafer map defect recognition (WMDR), yielding remarkable results. However, the dynamic and in-depth querying of wafer map defect information remains relatively underexplored. Leveraging the rapid advancements in multimodal large language models (MLLMs), this paper proposes a novel approach for wafer map defect information query (WMDIQ). First, following the paradigm of employing cross-modal alignment model to bridge vision and language models, an end-to-end response MLLM: general language model for wafer map (WamGLM), is constructed for WMDIQ. Concurrently, by designing an interactive dialogue framework between large language models (LLMs), the first large-scale multi-turn dialogue dataset: visual multi-turn question answering dataset for wafer map defects (WaferMapVMQA Dataset), is constructed for wafer map defect analysis. Subsequently, WamGLM is trained using a two-stage fine-tuning strategy. In the first stage, a visual fine-tuning method based on prototypical supervised contrastive learning (PSCL) is introduced to enhance the intra-class compactness and inter-class separability of defect features. In the second stage, language fine-tuning is conducted using the WaferMapVMQA Dataset to infuse specialized knowledge into WamGLM. To validate the effectiveness and superiority of the proposed method, experiments are conducted on a real wafer map dataset. The results demonstrate that the proposed method significantly outperforms other methods in both defect recognition performance and information query response performance. Our code is available at: <span><span>https://github.com/ZihaoLei/WamGLM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113962"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156849462501275X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
To ensure production efficiency and process stability in semiconductor manufacturing, it is of critical importance to detect wafer map defects and perform information query for tracing and solving problems during the manufacturing process. Numerous vision models based on deep learning have been successfully applied to wafer map defect recognition (WMDR), yielding remarkable results. However, the dynamic and in-depth querying of wafer map defect information remains relatively underexplored. Leveraging the rapid advancements in multimodal large language models (MLLMs), this paper proposes a novel approach for wafer map defect information query (WMDIQ). First, following the paradigm of employing cross-modal alignment model to bridge vision and language models, an end-to-end response MLLM: general language model for wafer map (WamGLM), is constructed for WMDIQ. Concurrently, by designing an interactive dialogue framework between large language models (LLMs), the first large-scale multi-turn dialogue dataset: visual multi-turn question answering dataset for wafer map defects (WaferMapVMQA Dataset), is constructed for wafer map defect analysis. Subsequently, WamGLM is trained using a two-stage fine-tuning strategy. In the first stage, a visual fine-tuning method based on prototypical supervised contrastive learning (PSCL) is introduced to enhance the intra-class compactness and inter-class separability of defect features. In the second stage, language fine-tuning is conducted using the WaferMapVMQA Dataset to infuse specialized knowledge into WamGLM. To validate the effectiveness and superiority of the proposed method, experiments are conducted on a real wafer map dataset. The results demonstrate that the proposed method significantly outperforms other methods in both defect recognition performance and information query response performance. Our code is available at: https://github.com/ZihaoLei/WamGLM.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.