VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection

IF 5.6 2区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi
{"title":"VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection","authors":"Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi","doi":"10.1109/TIM.2025.3583364","DOIUrl":null,"url":null,"abstract":"Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-13"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11052727/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.
VLCIM:用于工业缺陷检测的视觉语言循环交互模型
准确的缺陷检测是保证产品质量和设备安全运行的重要因素。然而,由于在视觉特征提取过程中缺乏深度跨模态交互作用(CMIs),现有方法往往存在注意偏差,最终限制了检测精度。针对这一问题,本文提出了一种视觉语言循环交互模型(VLCIM),该模型通过整合领域先验知识和通用大模型,逐步优化视觉特征提取,有效地弥合了“通用-特定”和“视觉语言”之间的双域障碍。其中,首次提出渐进式循环交互学习,将递归引导模块(RGM)与CMI策略相结合,实现视觉特征与语言特征的双向动态融合与协同优化。此外,提出的双视角协同检测机制(DSDM)增强了判别性决策响应,显著提高了模型在复杂场景下的边界感知能力和决策精度。VLCIM通过建立领域特定语言特征与视觉表示之间的循环交互机制,实现了高精度的缺陷检测。在三个工业数据集上的实验结果表明,VLCIM的平均相交比联合(mIoU)比最先进的(SOTA)方法分别提高了5.9%、5.6%和4.1%,表明了它在不同场景下的有效性和泛化性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Instrumentation and Measurement
IEEE Transactions on Instrumentation and Measurement 工程技术-工程:电子与电气
CiteScore
9.00
自引率
23.20%
发文量
1294
审稿时长
3.9 months
期刊介绍: Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信