VLCIM：用于工业缺陷检测的视觉语言循环交互模型

IF 5.6 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Instrumentation and Measurement Pub Date : 2025-06-26 DOI:10.1109/TIM.2025.3583364

Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi

{"title":"VLCIM：用于工业缺陷检测的视觉语言循环交互模型","authors":"Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi","doi":"10.1109/TIM.2025.3583364","DOIUrl":null,"url":null,"abstract":"Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-13"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection\",\"authors\":\"Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi\",\"doi\":\"10.1109/TIM.2025.3583364\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-13\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11052727/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11052727/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

准确的缺陷检测是保证产品质量和设备安全运行的重要因素。然而，由于在视觉特征提取过程中缺乏深度跨模态交互作用（CMIs），现有方法往往存在注意偏差，最终限制了检测精度。针对这一问题，本文提出了一种视觉语言循环交互模型（VLCIM），该模型通过整合领域先验知识和通用大模型，逐步优化视觉特征提取，有效地弥合了“通用-特定”和“视觉语言”之间的双域障碍。其中，首次提出渐进式循环交互学习，将递归引导模块（RGM）与CMI策略相结合，实现视觉特征与语言特征的双向动态融合与协同优化。此外，提出的双视角协同检测机制（DSDM）增强了判别性决策响应，显著提高了模型在复杂场景下的边界感知能力和决策精度。VLCIM通过建立领域特定语言特征与视觉表示之间的循环交互机制，实现了高精度的缺陷检测。在三个工业数据集上的实验结果表明，VLCIM的平均相交比联合（mIoU）比最先进的（SOTA）方法分别提高了5.9%、5.6%和4.1%，表明了它在不同场景下的有效性和泛化性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection

Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Instrumentation and Measurement 工程技术-工程：电子与电气

CiteScore

9.00

自引率

23.20%

发文量

1294

审稿时长

3.9 months

期刊介绍： Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.