基于小数据集的视觉语言知识引导改进列车表面缺陷检测

IF 8.4 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-04-15 DOI:10.1109/TITS.2025.3532731

Kaiyan Lei;Zhiquan Qi;Jin Song

{"title":"基于小数据集的视觉语言知识引导改进列车表面缺陷检测","authors":"Kaiyan Lei;Zhiquan Qi;Jin Song","doi":"10.1109/TITS.2025.3532731","DOIUrl":null,"url":null,"abstract":"Efficient and accurate detection of surface defects on trains is crucial for ensuring train safety. However, the insufficient defect samples and their diverse patterns make defect detection in complex environments highly challenging. This paper proposes a novel train surface defect detection model (ViLG) via visual-language knowledge guidance. By leveraging broad semantic knowledge of CLIP, the model compensates for the insufficient defect semantics in tiny datasets and enhances the ability to recognize unseen defects. First, we propose Visual Feature Guidance with CLIP, which enriches and enhances the global representation capabilities of backbone while preserving its self-learning ability for visual representations. This improves semantic understanding of complex scenarios and diverse defects. Second, we propose Defect Query Selector, which selects defect queries based on the semantic relevance between texts and global feature embeddings. This increases attention to potential defects and reduces missed detections. Finally, we propose Semantic Consistency Loss, which semantically aligns defect queries with defect prompts. With additional cross-modal supervision signals, it refines the semantics of defects. For real-world scenarios with normal reference images, we propose ViLG+, which effectively filters false positives using feature similarity. It further verifies that the global embeddings effectively represent the overall structure of visual scenes as well as subtle local features. Compared with other advanced methods on two train surface defect datasets and two public defect datasets, ViLG shows higher precision, recall, and average precision on unseen defects with relatively faster speed, with average improvements of 23.58, 3.23, and 6.05, and has a more balanced false positive rate and false negative rate.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 6","pages":"9080-9093"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Surface Defect Detection for Trains Based on Visual-Language Knowledge Guidance on Tiny Datasets\",\"authors\":\"Kaiyan Lei;Zhiquan Qi;Jin Song\",\"doi\":\"10.1109/TITS.2025.3532731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient and accurate detection of surface defects on trains is crucial for ensuring train safety. However, the insufficient defect samples and their diverse patterns make defect detection in complex environments highly challenging. This paper proposes a novel train surface defect detection model (ViLG) via visual-language knowledge guidance. By leveraging broad semantic knowledge of CLIP, the model compensates for the insufficient defect semantics in tiny datasets and enhances the ability to recognize unseen defects. First, we propose Visual Feature Guidance with CLIP, which enriches and enhances the global representation capabilities of backbone while preserving its self-learning ability for visual representations. This improves semantic understanding of complex scenarios and diverse defects. Second, we propose Defect Query Selector, which selects defect queries based on the semantic relevance between texts and global feature embeddings. This increases attention to potential defects and reduces missed detections. Finally, we propose Semantic Consistency Loss, which semantically aligns defect queries with defect prompts. With additional cross-modal supervision signals, it refines the semantics of defects. For real-world scenarios with normal reference images, we propose ViLG+, which effectively filters false positives using feature similarity. It further verifies that the global embeddings effectively represent the overall structure of visual scenes as well as subtle local features. Compared with other advanced methods on two train surface defect datasets and two public defect datasets, ViLG shows higher precision, recall, and average precision on unseen defects with relatively faster speed, with average improvements of 23.58, 3.23, and 6.05, and has a more balanced false positive rate and false negative rate.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 6\",\"pages\":\"9080-9093\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10965935/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10965935/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

高效、准确地检测列车表面缺陷对确保列车安全至关重要。然而，缺陷样本的不足及其模式的多样性给复杂环境下的缺陷检测带来了极大的挑战。提出了一种基于视觉语言知识引导的列车表面缺陷检测模型。通过利用CLIP广泛的语义知识，该模型弥补了小数据集中不足的缺陷语义，并增强了识别未见缺陷的能力。首先，我们提出了基于CLIP的视觉特征引导，丰富和增强了骨干网络的全局表征能力，同时保留了骨干网络对视觉表征的自学习能力。这提高了对复杂场景和各种缺陷的语义理解。其次，我们提出了缺陷查询选择器，它基于文本和全局特征嵌入之间的语义相关性来选择缺陷查询。这增加了对潜在缺陷的关注，并减少了遗漏的检测。最后，我们提出了语义一致性丢失，它在语义上将缺陷查询与缺陷提示对齐。通过附加的跨模态监督信号，改进了缺陷的语义。对于具有正常参考图像的真实场景，我们提出了ViLG+，它使用特征相似性有效地过滤假阳性。进一步验证了全局嵌入既能有效地表示视觉场景的整体结构，又能有效地表示微妙的局部特征。与其他先进方法相比，ViLG在两个列车表面缺陷数据集和两个公共缺陷数据集上对未见缺陷显示出更高的准确率、召回率和平均准确率，且速度相对较快，平均提高23.58、3.23和6.05，假阳性率和假阴性率更为平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Surface Defect Detection for Trains Based on Visual-Language Knowledge Guidance on Tiny Datasets

Efficient and accurate detection of surface defects on trains is crucial for ensuring train safety. However, the insufficient defect samples and their diverse patterns make defect detection in complex environments highly challenging. This paper proposes a novel train surface defect detection model (ViLG) via visual-language knowledge guidance. By leveraging broad semantic knowledge of CLIP, the model compensates for the insufficient defect semantics in tiny datasets and enhances the ability to recognize unseen defects. First, we propose Visual Feature Guidance with CLIP, which enriches and enhances the global representation capabilities of backbone while preserving its self-learning ability for visual representations. This improves semantic understanding of complex scenarios and diverse defects. Second, we propose Defect Query Selector, which selects defect queries based on the semantic relevance between texts and global feature embeddings. This increases attention to potential defects and reduces missed detections. Finally, we propose Semantic Consistency Loss, which semantically aligns defect queries with defect prompts. With additional cross-modal supervision signals, it refines the semantics of defects. For real-world scenarios with normal reference images, we propose ViLG+, which effectively filters false positives using feature similarity. It further verifies that the global embeddings effectively represent the overall structure of visual scenes as well as subtle local features. Compared with other advanced methods on two train surface defect datasets and two public defect datasets, ViLG shows higher precision, recall, and average precision on unseen defects with relatively faster speed, with average improvements of 23.58, 3.23, and 6.05, and has a more balanced false positive rate and false negative rate.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.