{"title":"DMM-UNet:用于医学图像分割的双路径多尺度曼巴UNet。","authors":"Liquan Zhao, Mingxia Cao, Yanfei Jia","doi":"10.1117/1.JMI.12.5.054003","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>State space models have shown promise in medical image segmentation by modeling long-range dependencies with linear complexity. However, they are limited in their ability to capture local features, which hinders their capacity to extract multiscale details and integrate global and local contextual information effectively. To address these shortcomings, we propose the dual-path multi-scale Mamba UNet (DMM-UNet) model.</p><p><strong>Approach: </strong>This architecture facilitates deep fusion of local and global features through multi-scale modules within a U-shaped encoder-decoder framework. First, we introduce the multi-scale channel attention selective scanning block in the encoder, which combines global selective scanning with multi-scale channel attention to model both long-range and local dependencies simultaneously. Second, we design the spatial attention selective scanning block for the decoder. This block integrates global scanning with spatial attention mechanisms, enabling precise aggregation of semantic features through gated weighting. Finally, we develop the multi-dimensional collaborative attention layer to extract complementary attention weights across height, width, and channel dimensions, facilitating cross-space-channel feature interactions.</p><p><strong>Results: </strong>Experiments were conducted on the ISIC17, ISIC18, Synapse, and ACDC datasets. One of the indicators, Dice similarity coefficient, achieved 89.88% on the ISIC17 dataset, 90.52% on the ISIC18 dataset, 83.07% on the Synapse dataset, and 92.60% on the ACDC dataset. There are also other indicators that perform well on this model.</p><p><strong>Conclusions: </strong>The DMM-UNet model effectively addresses the shortcomings of state space models by enabling the integration of both local and global features, improving segmentation performance, and offering enhanced multiscale feature fusion for medical image segmentation tasks.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054003"},"PeriodicalIF":1.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12480969/pdf/","citationCount":"0","resultStr":"{\"title\":\"DMM-UNet: dual-path multi-scale Mamba UNet for medical image segmentation.\",\"authors\":\"Liquan Zhao, Mingxia Cao, Yanfei Jia\",\"doi\":\"10.1117/1.JMI.12.5.054003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>State space models have shown promise in medical image segmentation by modeling long-range dependencies with linear complexity. However, they are limited in their ability to capture local features, which hinders their capacity to extract multiscale details and integrate global and local contextual information effectively. To address these shortcomings, we propose the dual-path multi-scale Mamba UNet (DMM-UNet) model.</p><p><strong>Approach: </strong>This architecture facilitates deep fusion of local and global features through multi-scale modules within a U-shaped encoder-decoder framework. First, we introduce the multi-scale channel attention selective scanning block in the encoder, which combines global selective scanning with multi-scale channel attention to model both long-range and local dependencies simultaneously. Second, we design the spatial attention selective scanning block for the decoder. This block integrates global scanning with spatial attention mechanisms, enabling precise aggregation of semantic features through gated weighting. Finally, we develop the multi-dimensional collaborative attention layer to extract complementary attention weights across height, width, and channel dimensions, facilitating cross-space-channel feature interactions.</p><p><strong>Results: </strong>Experiments were conducted on the ISIC17, ISIC18, Synapse, and ACDC datasets. One of the indicators, Dice similarity coefficient, achieved 89.88% on the ISIC17 dataset, 90.52% on the ISIC18 dataset, 83.07% on the Synapse dataset, and 92.60% on the ACDC dataset. There are also other indicators that perform well on this model.</p><p><strong>Conclusions: </strong>The DMM-UNet model effectively addresses the shortcomings of state space models by enabling the integration of both local and global features, improving segmentation performance, and offering enhanced multiscale feature fusion for medical image segmentation tasks.</p>\",\"PeriodicalId\":47707,\"journal\":{\"name\":\"Journal of Medical Imaging\",\"volume\":\"12 5\",\"pages\":\"054003\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12480969/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1117/1.JMI.12.5.054003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/29 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.5.054003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/29 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
DMM-UNet: dual-path multi-scale Mamba UNet for medical image segmentation.
Purpose: State space models have shown promise in medical image segmentation by modeling long-range dependencies with linear complexity. However, they are limited in their ability to capture local features, which hinders their capacity to extract multiscale details and integrate global and local contextual information effectively. To address these shortcomings, we propose the dual-path multi-scale Mamba UNet (DMM-UNet) model.
Approach: This architecture facilitates deep fusion of local and global features through multi-scale modules within a U-shaped encoder-decoder framework. First, we introduce the multi-scale channel attention selective scanning block in the encoder, which combines global selective scanning with multi-scale channel attention to model both long-range and local dependencies simultaneously. Second, we design the spatial attention selective scanning block for the decoder. This block integrates global scanning with spatial attention mechanisms, enabling precise aggregation of semantic features through gated weighting. Finally, we develop the multi-dimensional collaborative attention layer to extract complementary attention weights across height, width, and channel dimensions, facilitating cross-space-channel feature interactions.
Results: Experiments were conducted on the ISIC17, ISIC18, Synapse, and ACDC datasets. One of the indicators, Dice similarity coefficient, achieved 89.88% on the ISIC17 dataset, 90.52% on the ISIC18 dataset, 83.07% on the Synapse dataset, and 92.60% on the ACDC dataset. There are also other indicators that perform well on this model.
Conclusions: The DMM-UNet model effectively addresses the shortcomings of state space models by enabling the integration of both local and global features, improving segmentation performance, and offering enhanced multiscale feature fusion for medical image segmentation tasks.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.