DPNet: A dual prototype few-shot semantic segmentation network for crack detection

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-05-27 DOI:10.1016/j.knosys.2025.113733

Xiaoming Chen, Zhangyan Zhao, Jingjing Cao, Yuhang Zou, Haipeng Liu

{"title":"DPNet: A dual prototype few-shot semantic segmentation network for crack detection","authors":"Xiaoming Chen, Zhangyan Zhao, Jingjing Cao, Yuhang Zou, Haipeng Liu","doi":"10.1016/j.knosys.2025.113733","DOIUrl":null,"url":null,"abstract":"<div><div>Road crack detection is crucial for maintaining the aesthetics and safety of roads. The varying morphology of cracks often results in insufficient road crack samples, limiting the effectiveness of existing detection methods in few-sample scenarios. Further, when visual samples are insufficient, employing textual information to extract visual information from images is a cutting-edge technology. In this paper, we propose a Dual Prototype Network (DPNet) for few-shot crack detection. Firstly, we introduce an Improved Pixel Weight (IPW) data enhancement to strengthen the foreground and edges of cropped samples, improving learning efficiency in the case of insufficient samples. Next, we design a dual prototype prediction method. Specifically, we employ domain related text input to generate a Language-Image Prototype (LIP) with general domain knowledge through Contrastive Language-Image Pre-training (CLIP). Then, we generate a Support Prototype (SuP) with specialized domain knowledge from crack dataset images. The final prediction is obtained by linearly combining the predictions of the two prototypes. Additionally, we design an Embedding Attention Module (EAM), which leverages the characteristics of the embedding dimension to simultaneously satisfy both spatial and channel attention mechanisms in the transformer structure. Finally, our DPNet achieves superior performance on the FCrack-i and MixCrack few sample datasets, with an average mIoU improvement of 8.52% and 1.44% compared to the baseline. Moreover, we demonstrate the zero-shot capability of DPNet on CFD crack dataset.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113733"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125007798","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Road crack detection is crucial for maintaining the aesthetics and safety of roads. The varying morphology of cracks often results in insufficient road crack samples, limiting the effectiveness of existing detection methods in few-sample scenarios. Further, when visual samples are insufficient, employing textual information to extract visual information from images is a cutting-edge technology. In this paper, we propose a Dual Prototype Network (DPNet) for few-shot crack detection. Firstly, we introduce an Improved Pixel Weight (IPW) data enhancement to strengthen the foreground and edges of cropped samples, improving learning efficiency in the case of insufficient samples. Next, we design a dual prototype prediction method. Specifically, we employ domain related text input to generate a Language-Image Prototype (LIP) with general domain knowledge through Contrastive Language-Image Pre-training (CLIP). Then, we generate a Support Prototype (SuP) with specialized domain knowledge from crack dataset images. The final prediction is obtained by linearly combining the predictions of the two prototypes. Additionally, we design an Embedding Attention Module (EAM), which leverages the characteristics of the embedding dimension to simultaneously satisfy both spatial and channel attention mechanisms in the transformer structure. Finally, our DPNet achieves superior performance on the FCrack-i and MixCrack few sample datasets, with an average mIoU improvement of 8.52% and 1.44% compared to the baseline. Moreover, we demonstrate the zero-shot capability of DPNet on CFD crack dataset.

查看原文本刊更多论文

DPNet：一种用于裂纹检测的双原型少镜头语义分割网络

道路裂缝检测对于维护道路美观和安全至关重要。裂缝形态的变化往往导致道路裂缝样本不足，限制了现有检测方法在少样本场景下的有效性。此外，当视觉样本不足时，利用文本信息从图像中提取视觉信息是一项前沿技术。在本文中，我们提出了一种双原型网络（DPNet）用于少弹裂纹检测。首先，引入改进的像素权重（IPW）数据增强，增强裁剪样本的前景和边缘，在样本不足的情况下提高学习效率；接下来，我们设计了一种双原型预测方法。具体来说，我们使用领域相关的文本输入，通过对比语言图像预训练（CLIP）生成具有一般领域知识的语言图像原型（LIP）。然后，利用裂缝数据集图像中的专业领域知识生成支持原型（SuP）。将两个原型的预测结果线性组合，得到最终的预测结果。此外，我们还设计了一个嵌入注意模块（EAM），该模块利用嵌入维度的特性同时满足变压器结构中的空间和通道注意机制。最后，我们的DPNet在FCrack-i和MixCrack少数样本数据集上实现了卓越的性能，与基线相比，平均mIoU提高了8.52%和1.44%。此外，我们还在CFD裂缝数据集上验证了DPNet的零射击能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.