{"title":"基于小数据集的视觉语言知识引导改进列车表面缺陷检测","authors":"Kaiyan Lei;Zhiquan Qi;Jin Song","doi":"10.1109/TITS.2025.3532731","DOIUrl":null,"url":null,"abstract":"Efficient and accurate detection of surface defects on trains is crucial for ensuring train safety. However, the insufficient defect samples and their diverse patterns make defect detection in complex environments highly challenging. This paper proposes a novel train surface defect detection model (ViLG) via visual-language knowledge guidance. By leveraging broad semantic knowledge of CLIP, the model compensates for the insufficient defect semantics in tiny datasets and enhances the ability to recognize unseen defects. First, we propose Visual Feature Guidance with CLIP, which enriches and enhances the global representation capabilities of backbone while preserving its self-learning ability for visual representations. This improves semantic understanding of complex scenarios and diverse defects. Second, we propose Defect Query Selector, which selects defect queries based on the semantic relevance between texts and global feature embeddings. This increases attention to potential defects and reduces missed detections. Finally, we propose Semantic Consistency Loss, which semantically aligns defect queries with defect prompts. With additional cross-modal supervision signals, it refines the semantics of defects. For real-world scenarios with normal reference images, we propose ViLG+, which effectively filters false positives using feature similarity. It further verifies that the global embeddings effectively represent the overall structure of visual scenes as well as subtle local features. Compared with other advanced methods on two train surface defect datasets and two public defect datasets, ViLG shows higher precision, recall, and average precision on unseen defects with relatively faster speed, with average improvements of 23.58, 3.23, and 6.05, and has a more balanced false positive rate and false negative rate.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 6","pages":"9080-9093"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Surface Defect Detection for Trains Based on Visual-Language Knowledge Guidance on Tiny Datasets\",\"authors\":\"Kaiyan Lei;Zhiquan Qi;Jin Song\",\"doi\":\"10.1109/TITS.2025.3532731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient and accurate detection of surface defects on trains is crucial for ensuring train safety. However, the insufficient defect samples and their diverse patterns make defect detection in complex environments highly challenging. This paper proposes a novel train surface defect detection model (ViLG) via visual-language knowledge guidance. By leveraging broad semantic knowledge of CLIP, the model compensates for the insufficient defect semantics in tiny datasets and enhances the ability to recognize unseen defects. First, we propose Visual Feature Guidance with CLIP, which enriches and enhances the global representation capabilities of backbone while preserving its self-learning ability for visual representations. This improves semantic understanding of complex scenarios and diverse defects. Second, we propose Defect Query Selector, which selects defect queries based on the semantic relevance between texts and global feature embeddings. This increases attention to potential defects and reduces missed detections. Finally, we propose Semantic Consistency Loss, which semantically aligns defect queries with defect prompts. With additional cross-modal supervision signals, it refines the semantics of defects. For real-world scenarios with normal reference images, we propose ViLG+, which effectively filters false positives using feature similarity. It further verifies that the global embeddings effectively represent the overall structure of visual scenes as well as subtle local features. Compared with other advanced methods on two train surface defect datasets and two public defect datasets, ViLG shows higher precision, recall, and average precision on unseen defects with relatively faster speed, with average improvements of 23.58, 3.23, and 6.05, and has a more balanced false positive rate and false negative rate.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 6\",\"pages\":\"9080-9093\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10965935/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10965935/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
Improving Surface Defect Detection for Trains Based on Visual-Language Knowledge Guidance on Tiny Datasets
Efficient and accurate detection of surface defects on trains is crucial for ensuring train safety. However, the insufficient defect samples and their diverse patterns make defect detection in complex environments highly challenging. This paper proposes a novel train surface defect detection model (ViLG) via visual-language knowledge guidance. By leveraging broad semantic knowledge of CLIP, the model compensates for the insufficient defect semantics in tiny datasets and enhances the ability to recognize unseen defects. First, we propose Visual Feature Guidance with CLIP, which enriches and enhances the global representation capabilities of backbone while preserving its self-learning ability for visual representations. This improves semantic understanding of complex scenarios and diverse defects. Second, we propose Defect Query Selector, which selects defect queries based on the semantic relevance between texts and global feature embeddings. This increases attention to potential defects and reduces missed detections. Finally, we propose Semantic Consistency Loss, which semantically aligns defect queries with defect prompts. With additional cross-modal supervision signals, it refines the semantics of defects. For real-world scenarios with normal reference images, we propose ViLG+, which effectively filters false positives using feature similarity. It further verifies that the global embeddings effectively represent the overall structure of visual scenes as well as subtle local features. Compared with other advanced methods on two train surface defect datasets and two public defect datasets, ViLG shows higher precision, recall, and average precision on unseen defects with relatively faster speed, with average improvements of 23.58, 3.23, and 6.05, and has a more balanced false positive rate and false negative rate.
期刊介绍:
The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.