Jianming Zhang , Shigen Zhang , Dianwen Li , Jianxin Wang , Jin Wang
{"title":"Crack segmentation network via difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention","authors":"Jianming Zhang , Shigen Zhang , Dianwen Li , Jianxin Wang , Jin Wang","doi":"10.1016/j.patcog.2025.111723","DOIUrl":null,"url":null,"abstract":"<div><div>Cracks are the most common pavement defects, and failure to rehabilitate them promptly can lead to more severe road damage. Due to the thin, long, and irregular nature of cracks, their precise measurement remains a challenge. To tackle these issues, a network (DCCM-Net) via the difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention is proposed. First, an enhanced convolution module (ECM) is designed as a core component of the encoder to extract edge information of cracks. Besides our proposed diagonal difference convolution operator, the ECM uses five types of difference convolution operator to capture edge information of crack images, respectively along the horizontal, vertical, and diagonal directions in the Cartesian coordinate system, as well as the polar axis and polar angle directions in the polar coordinate system. Second, to overcome the limitation of the ECM-based encoder only extracting local features, a mixed convolution and Mamba (MixConv-Mamba) attention module for skip-connection is proposed. This module uses multi-scale depthwise separable convolutions to extract rich spatial features and effectively extracts global features using the Mamba block. The obtained features are processed using self-attention mechanisms, and the final features can capture spatial and long-range dependencies. Third, a feature fusion module (FFM) is introduced as a key component of the decoder to fuse the deep-layer and shallow-layer features effectively. Finally, in comparison with nine advanced networks across three public datasets—CrackTree260, DeepCrack, and CrackForest—our network demonstrates superior performance, achieving mean intersection over union (mIoU) scores of 84.92%, 86.60%, and 80.61%, respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111723"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003838","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Cracks are the most common pavement defects, and failure to rehabilitate them promptly can lead to more severe road damage. Due to the thin, long, and irregular nature of cracks, their precise measurement remains a challenge. To tackle these issues, a network (DCCM-Net) via the difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention is proposed. First, an enhanced convolution module (ECM) is designed as a core component of the encoder to extract edge information of cracks. Besides our proposed diagonal difference convolution operator, the ECM uses five types of difference convolution operator to capture edge information of crack images, respectively along the horizontal, vertical, and diagonal directions in the Cartesian coordinate system, as well as the polar axis and polar angle directions in the polar coordinate system. Second, to overcome the limitation of the ECM-based encoder only extracting local features, a mixed convolution and Mamba (MixConv-Mamba) attention module for skip-connection is proposed. This module uses multi-scale depthwise separable convolutions to extract rich spatial features and effectively extracts global features using the Mamba block. The obtained features are processed using self-attention mechanisms, and the final features can capture spatial and long-range dependencies. Third, a feature fusion module (FFM) is introduced as a key component of the decoder to fuse the deep-layer and shallow-layer features effectively. Finally, in comparison with nine advanced networks across three public datasets—CrackTree260, DeepCrack, and CrackForest—our network demonstrates superior performance, achieving mean intersection over union (mIoU) scores of 84.92%, 86.60%, and 80.61%, respectively.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.