Crack segmentation network via difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-04-25 DOI:10.1016/j.patcog.2025.111723

Jianming Zhang , Shigen Zhang , Dianwen Li , Jianxin Wang , Jin Wang

{"title":"Crack segmentation network via difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention","authors":"Jianming Zhang , Shigen Zhang , Dianwen Li , Jianxin Wang , Jin Wang","doi":"10.1016/j.patcog.2025.111723","DOIUrl":null,"url":null,"abstract":"<div><div>Cracks are the most common pavement defects, and failure to rehabilitate them promptly can lead to more severe road damage. Due to the thin, long, and irregular nature of cracks, their precise measurement remains a challenge. To tackle these issues, a network (DCCM-Net) via the difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention is proposed. First, an enhanced convolution module (ECM) is designed as a core component of the encoder to extract edge information of cracks. Besides our proposed diagonal difference convolution operator, the ECM uses five types of difference convolution operator to capture edge information of crack images, respectively along the horizontal, vertical, and diagonal directions in the Cartesian coordinate system, as well as the polar axis and polar angle directions in the polar coordinate system. Second, to overcome the limitation of the ECM-based encoder only extracting local features, a mixed convolution and Mamba (MixConv-Mamba) attention module for skip-connection is proposed. This module uses multi-scale depthwise separable convolutions to extract rich spatial features and effectively extracts global features using the Mamba block. The obtained features are processed using self-attention mechanisms, and the final features can capture spatial and long-range dependencies. Third, a feature fusion module (FFM) is introduced as a key component of the decoder to fuse the deep-layer and shallow-layer features effectively. Finally, in comparison with nine advanced networks across three public datasets—CrackTree260, DeepCrack, and CrackForest—our network demonstrates superior performance, achieving mean intersection over union (mIoU) scores of 84.92%, 86.60%, and 80.61%, respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111723"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003838","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Cracks are the most common pavement defects, and failure to rehabilitate them promptly can lead to more severe road damage. Due to the thin, long, and irregular nature of cracks, their precise measurement remains a challenge. To tackle these issues, a network (DCCM-Net) via the difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention is proposed. First, an enhanced convolution module (ECM) is designed as a core component of the encoder to extract edge information of cracks. Besides our proposed diagonal difference convolution operator, the ECM uses five types of difference convolution operator to capture edge information of crack images, respectively along the horizontal, vertical, and diagonal directions in the Cartesian coordinate system, as well as the polar axis and polar angle directions in the polar coordinate system. Second, to overcome the limitation of the ECM-based encoder only extracting local features, a mixed convolution and Mamba (MixConv-Mamba) attention module for skip-connection is proposed. This module uses multi-scale depthwise separable convolutions to extract rich spatial features and effectively extracts global features using the Mamba block. The obtained features are processed using self-attention mechanisms, and the final features can capture spatial and long-range dependencies. Third, a feature fusion module (FFM) is introduced as a key component of the decoder to fuse the deep-layer and shallow-layer features effectively. Finally, in comparison with nine advanced networks across three public datasets—CrackTree260, DeepCrack, and CrackForest—our network demonstrates superior performance, achieving mean intersection over union (mIoU) scores of 84.92%, 86.60%, and 80.61%, respectively.

查看原文本刊更多论文

基于差分卷积编码器和混合CNN-Mamba多尺度注意力的裂缝分割网络

裂缝是最常见的路面缺陷，如果不及时修复，可能会导致更严重的道路损坏。由于裂缝的薄、长和不规则性质，它们的精确测量仍然是一个挑战。为了解决这些问题，提出了一种基于差分卷积编码器和CNN-Mamba混合多尺度注意力的网络（DCCM-Net）。首先，设计了增强卷积模块（enhanced convolution module， ECM）作为编码器的核心组件，提取裂纹边缘信息；除了我们提出的对角差分卷积算子外，ECM还使用了五种差分卷积算子分别在笛卡尔坐标系中的水平方向、垂直方向和对角方向以及极坐标系中的极轴和极角方向捕获裂纹图像的边缘信息。其次，为了克服基于ecm的编码器只提取局部特征的局限性，提出了一种用于跳过连接的混合卷积和曼巴（mixconvo -Mamba）注意模块。该模块利用多尺度深度可分卷积提取丰富的空间特征，并利用曼巴块有效提取全局特征。利用自关注机制对获得的特征进行处理，最终特征可以捕获空间和远程依赖关系。第三，引入特征融合模块（feature fusion module， FFM）作为解码器的关键部件，有效融合深层和浅层特征。最后，与三个公共数据集（cracktree260、DeepCrack和crackforest）上的9个高级网络进行比较，我们的网络表现出了卓越的性能，分别实现了84.92%、86.60%和80.61%的平均交叉超过联合（mIoU）分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.