MSCD-VM-UNet: A Vision Mamba Combining Multi-Scale Global and Local Feature Extraction With Cross-Domain Feature Fusion for Medical Image Segmentation.

IF 6.8 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Journal of Biomedical and Health Informatics Pub Date : 2025-10-01 DOI:10.1109/JBHI.2025.3575447

Zhiyong Huang, Shuxin Wang, Mingyang Hou, Zhi Yu, Shiwei Wang, Xiaoyu Li, Yan Yan, Yushi Liu, Hans Gregersen

{"title":"MSCD-VM-UNet: A Vision Mamba Combining Multi-Scale Global and Local Feature Extraction With Cross-Domain Feature Fusion for Medical Image Segmentation.","authors":"Zhiyong Huang, Shuxin Wang, Mingyang Hou, Zhi Yu, Shiwei Wang, Xiaoyu Li, Yan Yan, Yushi Liu, Hans Gregersen","doi":"10.1109/JBHI.2025.3575447","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate segmentation of tissues and lesions is essential for diagnosis and treatment. State Space Models (SSMs) have gained attention for their linear complexity and ability to model long-range dependencies. However, the existing Mamba architecture relies on direct skip connections, which limits its ability to integrate multi-scale and multi-level features and handle boundary details effectively. To address these limitations, we propose the MSCD-VM-UNet architecture, which incorporates three novel modules: the Spatial Group Multi-Scale Attention Module (SGMAM), the Cross-Domain Feature Fusion Module (CDFFM), and the Attention-Based Feature Injection Module (ABFIM). The SGMAM captures multi-scale global and local information and adaptively adjusts feature importance to highlight key regions while suppressing noise. The CDFFM enhances boundary and detail handling by aligning semantic features from both the frequency and spatial domains. The ABFIM utilizes attention mechanisms to adaptively fuse and weigh features from different scales and semantics, promoting feature collaboration and improving the model's robustness in complex tasks. Experiments on multiple datasets show that these modules significantly enhance the accuracy of MSCD-VM-UNet, setting a new benchmark for medical image segmentation.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":"7312-7325"},"PeriodicalIF":6.8000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3575447","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate segmentation of tissues and lesions is essential for diagnosis and treatment. State Space Models (SSMs) have gained attention for their linear complexity and ability to model long-range dependencies. However, the existing Mamba architecture relies on direct skip connections, which limits its ability to integrate multi-scale and multi-level features and handle boundary details effectively. To address these limitations, we propose the MSCD-VM-UNet architecture, which incorporates three novel modules: the Spatial Group Multi-Scale Attention Module (SGMAM), the Cross-Domain Feature Fusion Module (CDFFM), and the Attention-Based Feature Injection Module (ABFIM). The SGMAM captures multi-scale global and local information and adaptively adjusts feature importance to highlight key regions while suppressing noise. The CDFFM enhances boundary and detail handling by aligning semantic features from both the frequency and spatial domains. The ABFIM utilizes attention mechanisms to adaptively fuse and weigh features from different scales and semantics, promoting feature collaboration and improving the model's robustness in complex tasks. Experiments on multiple datasets show that these modules significantly enhance the accuracy of MSCD-VM-UNet, setting a new benchmark for medical image segmentation.

查看原文本刊更多论文

MSCD-VM-UNet：一种结合多尺度全局和局部特征提取与跨域特征融合的医学图像分割视觉曼巴算法。

组织和病变的准确分割对诊断和治疗至关重要。状态空间模型（ssm）因其线性复杂性和对长期依赖关系建模的能力而受到关注。然而，现有的Mamba结构依赖于直接跳过连接，这限制了其集成多尺度和多层次特征以及有效处理边界细节的能力。为了解决这些限制，我们提出了MSCD-VM-UNet架构，该架构包含三个新颖的模块：空间组多尺度注意力模块（SGMAM）、跨域特征融合模块（CDFFM）和基于注意力的特征注入模块（ABFIM）。SGMAM捕获多尺度的全局和局部信息，并自适应调整特征的重要性，突出关键区域，同时抑制噪声。CDFFM通过从频率域和空间域对齐语义特征来增强边界和细节处理。ABFIM利用注意机制自适应地融合和权衡来自不同尺度和语义的特征，促进特征协作，提高模型在复杂任务中的鲁棒性。在多个数据集上的实验表明，这些模块显著提高了MSCD-VM-UNet的准确率，为医学图像分割树立了新的标杆。我们的代码将在https://github.com/StphenWang/MSCD-VM-UNet上提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

13.60

自引率

6.50%

发文量

1151

期刊介绍： IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.