DMM-UNet: dual-path multi-scale Mamba UNet for medical image segmentation.

IF 1.7 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging Pub Date : 2025-09-01 Epub Date: 2025-09-29 DOI:10.1117/1.JMI.12.5.054003

Liquan Zhao, Mingxia Cao, Yanfei Jia

{"title":"DMM-UNet: dual-path multi-scale Mamba UNet for medical image segmentation.","authors":"Liquan Zhao, Mingxia Cao, Yanfei Jia","doi":"10.1117/1.JMI.12.5.054003","DOIUrl":null,"url":null,"abstract":"Purpose: State space models have shown promise in medical image segmentation by modeling long-range dependencies with linear complexity. However, they are limited in their ability to capture local features, which hinders their capacity to extract multiscale details and integrate global and local contextual information effectively. To address these shortcomings, we propose the dual-path multi-scale Mamba UNet (DMM-UNet) model.Approach: This architecture facilitates deep fusion of local and global features through multi-scale modules within a U-shaped encoder-decoder framework. First, we introduce the multi-scale channel attention selective scanning block in the encoder, which combines global selective scanning with multi-scale channel attention to model both long-range and local dependencies simultaneously. Second, we design the spatial attention selective scanning block for the decoder. This block integrates global scanning with spatial attention mechanisms, enabling precise aggregation of semantic features through gated weighting. Finally, we develop the multi-dimensional collaborative attention layer to extract complementary attention weights across height, width, and channel dimensions, facilitating cross-space-channel feature interactions.Results: Experiments were conducted on the ISIC17, ISIC18, Synapse, and ACDC datasets. One of the indicators, Dice similarity coefficient, achieved 89.88% on the ISIC17 dataset, 90.52% on the ISIC18 dataset, 83.07% on the Synapse dataset, and 92.60% on the ACDC dataset. There are also other indicators that perform well on this model.Conclusions: The DMM-UNet model effectively addresses the shortcomings of state space models by enabling the integration of both local and global features, improving segmentation performance, and offering enhanced multiscale feature fusion for medical image segmentation tasks.","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054003"},"PeriodicalIF":1.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12480969/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.5.054003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/29 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: State space models have shown promise in medical image segmentation by modeling long-range dependencies with linear complexity. However, they are limited in their ability to capture local features, which hinders their capacity to extract multiscale details and integrate global and local contextual information effectively. To address these shortcomings, we propose the dual-path multi-scale Mamba UNet (DMM-UNet) model.

Approach: This architecture facilitates deep fusion of local and global features through multi-scale modules within a U-shaped encoder-decoder framework. First, we introduce the multi-scale channel attention selective scanning block in the encoder, which combines global selective scanning with multi-scale channel attention to model both long-range and local dependencies simultaneously. Second, we design the spatial attention selective scanning block for the decoder. This block integrates global scanning with spatial attention mechanisms, enabling precise aggregation of semantic features through gated weighting. Finally, we develop the multi-dimensional collaborative attention layer to extract complementary attention weights across height, width, and channel dimensions, facilitating cross-space-channel feature interactions.

Results: Experiments were conducted on the ISIC17, ISIC18, Synapse, and ACDC datasets. One of the indicators, Dice similarity coefficient, achieved 89.88% on the ISIC17 dataset, 90.52% on the ISIC18 dataset, 83.07% on the Synapse dataset, and 92.60% on the ACDC dataset. There are also other indicators that perform well on this model.

Conclusions: The DMM-UNet model effectively addresses the shortcomings of state space models by enabling the integration of both local and global features, improving segmentation performance, and offering enhanced multiscale feature fusion for medical image segmentation tasks.

查看原文本刊更多论文

DMM-UNet：用于医学图像分割的双路径多尺度曼巴UNet。

目的：状态空间模型通过对具有线性复杂性的远程依赖关系进行建模，在医学图像分割中显示出良好的前景。然而，它们捕获局部特征的能力有限，这阻碍了它们提取多尺度细节和有效整合全局和局部上下文信息的能力。为了解决这些缺点，我们提出了双路径多尺度曼巴UNet （DMM-UNet）模型。方法：该架构通过u型编码器-解码器框架内的多尺度模块促进局部和全局特征的深度融合。首先，我们在编码器中引入了多尺度通道注意选择性扫描块，将全局选择性扫描与多尺度通道注意相结合，同时对远程和局部依赖关系进行建模。其次，设计了译码器的空间注意选择扫描块。该块集成了全局扫描和空间注意机制，通过门控加权实现语义特征的精确聚合。最后，我们开发了多维协同关注层，以提取跨高度、宽度和通道维度的互补关注权重，促进跨空间通道特征交互。结果：在ISIC17、ISIC18、Synapse和ACDC数据集上进行了实验。其中Dice相似系数在ISIC17数据集上达到89.88%，在ISIC18数据集上达到90.52%，在Synapse数据集上达到83.07%，在ACDC数据集上达到92.60%。还有其他一些指标在这个模型上表现良好。结论：DMM-UNet模型有效地解决了状态空间模型的不足，实现了局部和全局特征的融合，提高了分割性能，并为医学图像分割任务提供了增强的多尺度特征融合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.10

自引率

4.20%

发文量

期刊介绍： JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.