Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi
{"title":"VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection","authors":"Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi","doi":"10.1109/TIM.2025.3583364","DOIUrl":null,"url":null,"abstract":"Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-13"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11052727/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.