VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection

IF 5.6 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Instrumentation and Measurement Pub Date : 2025-06-26 DOI:10.1109/TIM.2025.3583364

Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi

{"title":"VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection","authors":"Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi","doi":"10.1109/TIM.2025.3583364","DOIUrl":null,"url":null,"abstract":"Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-13"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11052727/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.

查看原文本刊更多论文

VLCIM：用于工业缺陷检测的视觉语言循环交互模型

准确的缺陷检测是保证产品质量和设备安全运行的重要因素。然而，由于在视觉特征提取过程中缺乏深度跨模态交互作用（CMIs），现有方法往往存在注意偏差，最终限制了检测精度。针对这一问题，本文提出了一种视觉语言循环交互模型（VLCIM），该模型通过整合领域先验知识和通用大模型，逐步优化视觉特征提取，有效地弥合了“通用-特定”和“视觉语言”之间的双域障碍。其中，首次提出渐进式循环交互学习，将递归引导模块（RGM）与CMI策略相结合，实现视觉特征与语言特征的双向动态融合与协同优化。此外，提出的双视角协同检测机制（DSDM）增强了判别性决策响应，显著提高了模型在复杂场景下的边界感知能力和决策精度。VLCIM通过建立领域特定语言特征与视觉表示之间的循环交互机制，实现了高精度的缺陷检测。在三个工业数据集上的实验结果表明，VLCIM的平均相交比联合（mIoU）比最先进的（SOTA）方法分别提高了5.9%、5.6%和4.1%，表明了它在不同场景下的有效性和泛化性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Instrumentation and Measurement 工程技术-工程：电子与电气

CiteScore

9.00

自引率

23.20%

发文量

1294

审稿时长

3.9 months

期刊介绍： Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.