Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi
{"title":"VLCIM:用于工业缺陷检测的视觉语言循环交互模型","authors":"Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi","doi":"10.1109/TIM.2025.3583364","DOIUrl":null,"url":null,"abstract":"Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-13"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection\",\"authors\":\"Xiangkai Shen;Lei Li;Yushan Ma;Shaofeng Xu;Jinhai Liu;Zhiguo Yang;Yan Shi\",\"doi\":\"10.1109/TIM.2025.3583364\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-13\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11052727/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11052727/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
VLCIM: A Vision-Language Cyclic Interaction Model for Industrial Defect Detection
Accurate defect detection is an important element in ensuring product quality and safe equipment operation. However, due to the lack of deep cross-modal interactions (CMIs) during vision feature extraction, existing methods often suffer from attention bias, which ultimately limits detection accuracy. To address this issue, this article proposes a vision-language cyclic interaction model (VLCIM), which progressively optimizes vision feature extraction by integrating domain prior knowledge and generic large model, effectively bridging the dual-domain barrier between “generic-specific” and “vision-language.” Specifically, progressive cyclic interaction learning is proposed for the first time, which integrates a recursive guidance module (RGM) and CMI strategy to realize bidirectional dynamic fusion and collaborative optimization of vision and language features. Furthermore, the proposed dual-view synergistic detection mechanism (DSDM) enhances discriminative decision responses, significantly improving the model’s boundary perception ability and decision-making accuracy in complex scenarios. VLCIM achieves high-precision defect detection by establishing a cyclic interaction mechanism between domain-specific language features and vision representations. The experimental results on three industrial datasets demonstrate that VLCIM achieves improvements of 5.9%, 5.6%, and 4.1% in mean intersection over union (mIoU) over the state-of-the-art (SOTA) methods, indicating its validity and generalization in different scenarios.
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.