MambaMIM: Pre-training Mamba with state space token interpolation and its application to medical image segmentation

IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Fenghe Tang , Bingkun Nian , Yingtai Li , Zihang Jiang , Jie Yang , Wei Liu , S. Kevin Zhou
{"title":"MambaMIM: Pre-training Mamba with state space token interpolation and its application to medical image segmentation","authors":"Fenghe Tang ,&nbsp;Bingkun Nian ,&nbsp;Yingtai Li ,&nbsp;Zihang Jiang ,&nbsp;Jie Yang ,&nbsp;Wei Liu ,&nbsp;S. Kevin Zhou","doi":"10.1016/j.media.2025.103606","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, the state space model Mamba has demonstrated efficient long-sequence modeling capabilities, particularly for addressing long-sequence visual tasks in 3D medical imaging. However, existing generative self-supervised learning methods have not yet fully unleashed Mamba’s potential for handling long-range dependencies because they overlook the inherent causal properties of state space sequences in masked modeling. To address this challenge, we propose a general-purpose pre-training framework called MambaMIM, a masked image modeling method based on a novel <strong>TOKen-Interpolation</strong> strategy (TOKI) for the selective structure state space sequence, which learns causal relationships of state space within the masked sequence. Further, MambaMIM introduces a bottom-up 3D hybrid masking strategy to maintain a <strong>masking consistency</strong> across different architectures and can be used on any single or hybrid Mamba architecture to enhance its multi-scale and long-range representation capability. We pre-train MambaMIM on a large-scale dataset of 6.8K CT scans and evaluate its performance across eight public medical segmentation benchmarks. Extensive downstream experiments reveal the feasibility and advancement of using Mamba for medical image pre-training. In particular, when we apply the MambaMIM to a customized architecture that hybridizes MedNeXt and Vision Mamba, we consistently obtain the state-of-the-art segmentation performance. The code is available at: <span><span>https://github.com/FengheTan9/MambaMIM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103606"},"PeriodicalIF":10.7000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525001537","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, the state space model Mamba has demonstrated efficient long-sequence modeling capabilities, particularly for addressing long-sequence visual tasks in 3D medical imaging. However, existing generative self-supervised learning methods have not yet fully unleashed Mamba’s potential for handling long-range dependencies because they overlook the inherent causal properties of state space sequences in masked modeling. To address this challenge, we propose a general-purpose pre-training framework called MambaMIM, a masked image modeling method based on a novel TOKen-Interpolation strategy (TOKI) for the selective structure state space sequence, which learns causal relationships of state space within the masked sequence. Further, MambaMIM introduces a bottom-up 3D hybrid masking strategy to maintain a masking consistency across different architectures and can be used on any single or hybrid Mamba architecture to enhance its multi-scale and long-range representation capability. We pre-train MambaMIM on a large-scale dataset of 6.8K CT scans and evaluate its performance across eight public medical segmentation benchmarks. Extensive downstream experiments reveal the feasibility and advancement of using Mamba for medical image pre-training. In particular, when we apply the MambaMIM to a customized architecture that hybridizes MedNeXt and Vision Mamba, we consistently obtain the state-of-the-art segmentation performance. The code is available at: https://github.com/FengheTan9/MambaMIM.
MambaMIM:状态空间标记插值预训练曼巴及其在医学图像分割中的应用
最近,状态空间模型Mamba已经证明了高效的长序列建模能力,特别是在解决3D医学成像中的长序列视觉任务方面。然而,现有的生成式自监督学习方法还没有完全释放Mamba处理远程依赖关系的潜力,因为它们忽略了掩模建模中状态空间序列的内在因果属性。为了解决这一挑战,我们提出了一种通用的预训练框架MambaMIM,这是一种基于新的token插值策略(TOKI)的选择性结构状态空间序列的掩膜图像建模方法,它可以学习掩膜序列中状态空间的因果关系。此外,MambaMIM引入了自下而上的3D混合掩蔽策略,以保持不同架构之间的掩蔽一致性,并且可以在任何单一或混合Mamba架构上使用,以增强其多尺度和远程表示能力。我们在6.8K CT扫描的大规模数据集上预训练MambaMIM,并在8个公共医疗分割基准上评估其性能。大量的下游实验揭示了曼巴用于医学图像预训练的可行性和先进性。特别是,当我们将MambaMIM应用于混合MedNeXt和Vision Mamba的定制架构时,我们始终如一地获得了最先进的分割性能。代码可从https://github.com/FengheTan9/MambaMIM获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Medical image analysis
Medical image analysis 工程技术-工程:生物医学
CiteScore
22.10
自引率
6.40%
发文量
309
审稿时长
6.6 months
期刊介绍: Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信