3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2024-08-23 DOI:10.1016/j.media.2024.103324

Shizhan Gong, Yuan Zhong, Wenao Ma, Jinpeng Li, Zhao Wang, Jingyang Zhang, Pheng-Ann Heng, Qi Dou

{"title":"3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation","authors":"Shizhan Gong, Yuan Zhong, Wenao Ma, Jinpeng Li, Zhao Wang, Jingyang Zhang, Pheng-Ann Heng, Qi Dou","doi":"10.1016/j.media.2024.103324","DOIUrl":null,"url":null,"abstract":"<div><p>Despite that the segment anything model (SAM) achieved impressive results on general-purpose semantic segmentation with strong generalization ability on daily images, its demonstrated performance on medical image segmentation is less precise and unstable, especially when dealing with tumor segmentation tasks that involve objects of small sizes, irregular shapes, and low contrast. Notably, the original SAM architecture is designed for 2D natural images and, therefore would not be able to extract the 3D spatial information from volumetric medical data effectively. In this paper, we propose a novel adaptation method for transferring SAM from 2D to 3D for promptable medical image segmentation. Through a holistically designed scheme for architecture modification, we transfer the SAM to support volumetric inputs while retaining the majority of its pre-trained parameters for reuse. The fine-tuning process is conducted in a parameter-efficient manner, wherein most of the pre-trained parameters remain frozen, and only a few lightweight spatial adapters are introduced and tuned. Regardless of the domain gap between natural and medical data and the disparity in the spatial arrangement between 2D and 3D, the transformer trained on natural images can effectively capture the spatial patterns present in volumetric medical images with only lightweight adaptations. We conduct experiments on four open-source tumor segmentation datasets, and with a single click prompt, our model can outperform domain state-of-the-art medical image segmentation models and interactive segmentation models. We also compared our adaptation method with existing popular adapters and observed significant performance improvement on most datasets. Our code and models are available at: <span><span>https://github.com/med-air/3DSAM-adapter</span><svg><path></path></svg></span></p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"98 ","pages":"Article 103324"},"PeriodicalIF":10.7000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841524002494","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Despite that the segment anything model (SAM) achieved impressive results on general-purpose semantic segmentation with strong generalization ability on daily images, its demonstrated performance on medical image segmentation is less precise and unstable, especially when dealing with tumor segmentation tasks that involve objects of small sizes, irregular shapes, and low contrast. Notably, the original SAM architecture is designed for 2D natural images and, therefore would not be able to extract the 3D spatial information from volumetric medical data effectively. In this paper, we propose a novel adaptation method for transferring SAM from 2D to 3D for promptable medical image segmentation. Through a holistically designed scheme for architecture modification, we transfer the SAM to support volumetric inputs while retaining the majority of its pre-trained parameters for reuse. The fine-tuning process is conducted in a parameter-efficient manner, wherein most of the pre-trained parameters remain frozen, and only a few lightweight spatial adapters are introduced and tuned. Regardless of the domain gap between natural and medical data and the disparity in the spatial arrangement between 2D and 3D, the transformer trained on natural images can effectively capture the spatial patterns present in volumetric medical images with only lightweight adaptations. We conduct experiments on four open-source tumor segmentation datasets, and with a single click prompt, our model can outperform domain state-of-the-art medical image segmentation models and interactive segmentation models. We also compared our adaptation method with existing popular adapters and observed significant performance improvement on most datasets. Our code and models are available at: https://github.com/med-air/3DSAM-adapter

查看原文本刊更多论文

3DSAM-adapter：从二维到三维的 SAM 整体适配，可及时进行肿瘤分割

尽管segment anything model（SAM）在通用语义分割方面取得了令人印象深刻的成果，对日常图像具有很强的泛化能力，但其在医学图像分割方面的表现却不够精确且不稳定，尤其是在处理涉及小尺寸、不规则形状和低对比度物体的肿瘤分割任务时。值得注意的是，最初的 SAM 架构是针对二维自然图像设计的，因此无法从体积医学数据中有效提取三维空间信息。在本文中，我们提出了一种新颖的适应方法，将 SAM 从二维转为三维，用于可提示的医学图像分割。通过整体设计的架构修改方案，我们将 SAM 转移到支持体积输入，同时保留其大部分预训练参数以供重复使用。微调过程以参数高效的方式进行，其中大部分预训练参数保持不变，只引入和调整了少量轻量级空间适配器。无论自然数据和医学数据之间存在多大的领域差距，也无论二维和三维之间在空间排列上存在多大的差异，在自然图像上训练的变换器只需轻量级的适配就能有效捕捉容积医学图像中存在的空间模式。我们在四个开源肿瘤分割数据集上进行了实验，只需单击提示，我们的模型就能超越领域内最先进的医学图像分割模型和交互式分割模型。我们还将我们的适配方法与现有的流行适配器进行了比较，并观察到在大多数数据集上性能都有显著提高。我们的代码和模型可在以下网址获取： https://github.com/med-air/3DSAM-adapter

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.