MSCD-VM-UNet: A Vision Mamba Combining Multi-Scale Global and Local Feature Extraction with Cross-Domain Feature Fusion for Medical Image Segmentation.
IF 6.8 2区 医学Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Zhiyong Huang, Shuxin Wang, Mingyang Hou, Zhi Yu, Shiwei Wang, Xiaoyu Li, Yan Yan, Yushi Liu, Hans Gregersen
{"title":"MSCD-VM-UNet: A Vision Mamba Combining Multi-Scale Global and Local Feature Extraction with Cross-Domain Feature Fusion for Medical Image Segmentation.","authors":"Zhiyong Huang, Shuxin Wang, Mingyang Hou, Zhi Yu, Shiwei Wang, Xiaoyu Li, Yan Yan, Yushi Liu, Hans Gregersen","doi":"10.1109/JBHI.2025.3575447","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate segmentation of tissues and lesions is essential for diagnosis and treatment. State Space Models (SSMs) have gained attention for their linear complexity and ability to model long-range dependencies. However, the existing Mamba architecture relies on direct skip connections, which limits its ability to integrate multi-scale and multi-level features and handle boundary details effectively. To address these limitations, we propose the MSCD-VM-UNet architecture, which incorporates three novel modules: the Spatial Group Multi-Scale Attention Module (SGMAM), the Cross-Domain Feature Fusion Module (CDFFM), and the Attention-Based Feature Injection Module (ABFIM). The SGMAM captures multi-scale global and local information and adaptively adjusts feature importance to highlight key regions while suppressing noise. The CDFFM enhances boundary and detail handling by aligning semantic features from both the frequency and spatial domains. The ABFIM utilizes attention mechanisms to adaptively fuse and weigh features from different scales and semantics, promoting feature collaboration and improving the model's robustness in complex tasks. Experiments on multiple datasets show that these modules significantly enhance the accuracy of MSCD-VM-UNet, setting a new benchmark for medical image segmentation. Our code will be made available at https://github.com/StphenWang/MSCD-VM-UNet.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3575447","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate segmentation of tissues and lesions is essential for diagnosis and treatment. State Space Models (SSMs) have gained attention for their linear complexity and ability to model long-range dependencies. However, the existing Mamba architecture relies on direct skip connections, which limits its ability to integrate multi-scale and multi-level features and handle boundary details effectively. To address these limitations, we propose the MSCD-VM-UNet architecture, which incorporates three novel modules: the Spatial Group Multi-Scale Attention Module (SGMAM), the Cross-Domain Feature Fusion Module (CDFFM), and the Attention-Based Feature Injection Module (ABFIM). The SGMAM captures multi-scale global and local information and adaptively adjusts feature importance to highlight key regions while suppressing noise. The CDFFM enhances boundary and detail handling by aligning semantic features from both the frequency and spatial domains. The ABFIM utilizes attention mechanisms to adaptively fuse and weigh features from different scales and semantics, promoting feature collaboration and improving the model's robustness in complex tasks. Experiments on multiple datasets show that these modules significantly enhance the accuracy of MSCD-VM-UNet, setting a new benchmark for medical image segmentation. Our code will be made available at https://github.com/StphenWang/MSCD-VM-UNet.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.