Xiaoming Chen, Zhangyan Zhao, Jingjing Cao, Yuhang Zou, Haipeng Liu
{"title":"DPNet: A dual prototype few-shot semantic segmentation network for crack detection","authors":"Xiaoming Chen, Zhangyan Zhao, Jingjing Cao, Yuhang Zou, Haipeng Liu","doi":"10.1016/j.knosys.2025.113733","DOIUrl":null,"url":null,"abstract":"<div><div>Road crack detection is crucial for maintaining the aesthetics and safety of roads. The varying morphology of cracks often results in insufficient road crack samples, limiting the effectiveness of existing detection methods in few-sample scenarios. Further, when visual samples are insufficient, employing textual information to extract visual information from images is a cutting-edge technology. In this paper, we propose a Dual Prototype Network (DPNet) for few-shot crack detection. Firstly, we introduce an Improved Pixel Weight (IPW) data enhancement to strengthen the foreground and edges of cropped samples, improving learning efficiency in the case of insufficient samples. Next, we design a dual prototype prediction method. Specifically, we employ domain related text input to generate a Language-Image Prototype (LIP) with general domain knowledge through Contrastive Language-Image Pre-training (CLIP). Then, we generate a Support Prototype (SuP) with specialized domain knowledge from crack dataset images. The final prediction is obtained by linearly combining the predictions of the two prototypes. Additionally, we design an Embedding Attention Module (EAM), which leverages the characteristics of the embedding dimension to simultaneously satisfy both spatial and channel attention mechanisms in the transformer structure. Finally, our DPNet achieves superior performance on the FCrack-i and MixCrack few sample datasets, with an average mIoU improvement of 8.52% and 1.44% compared to the baseline. Moreover, we demonstrate the zero-shot capability of DPNet on CFD crack dataset.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113733"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125007798","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Road crack detection is crucial for maintaining the aesthetics and safety of roads. The varying morphology of cracks often results in insufficient road crack samples, limiting the effectiveness of existing detection methods in few-sample scenarios. Further, when visual samples are insufficient, employing textual information to extract visual information from images is a cutting-edge technology. In this paper, we propose a Dual Prototype Network (DPNet) for few-shot crack detection. Firstly, we introduce an Improved Pixel Weight (IPW) data enhancement to strengthen the foreground and edges of cropped samples, improving learning efficiency in the case of insufficient samples. Next, we design a dual prototype prediction method. Specifically, we employ domain related text input to generate a Language-Image Prototype (LIP) with general domain knowledge through Contrastive Language-Image Pre-training (CLIP). Then, we generate a Support Prototype (SuP) with specialized domain knowledge from crack dataset images. The final prediction is obtained by linearly combining the predictions of the two prototypes. Additionally, we design an Embedding Attention Module (EAM), which leverages the characteristics of the embedding dimension to simultaneously satisfy both spatial and channel attention mechanisms in the transformer structure. Finally, our DPNet achieves superior performance on the FCrack-i and MixCrack few sample datasets, with an average mIoU improvement of 8.52% and 1.44% compared to the baseline. Moreover, we demonstrate the zero-shot capability of DPNet on CFD crack dataset.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.