Taming large vision model for medical image segmentation via Dual Visual Prompt Tuning

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL

Computerized Medical Imaging and Graphics Pub Date : 2025-07-19 DOI:10.1016/j.compmedimag.2025.102608

Ruize Cui , Lanqing Liu , Jing Zou , Xiaowei Hu , Jialun Pei , Jing Qin

{"title":"Taming large vision model for medical image segmentation via Dual Visual Prompt Tuning","authors":"Ruize Cui , Lanqing Liu , Jing Zou , Xiaowei Hu , Jialun Pei , Jing Qin","doi":"10.1016/j.compmedimag.2025.102608","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents Dual Visual Prompt Tuning (DVPT), an innovative strategy to enhance the performance of the Segment Anything Model (SAM) for medical image segmentation. While SAM demonstrates robust generalization in natural image segmentation, its effectiveness in medical tasks is hindered by the distinct characteristics of medical targets, the presence of noise and artifacts, and insufficient task-specific data for fine-tuning. Moreover, the manual-prompting paradigm applied in SAM make it laborious when adapted to medical domain. To address these challenges, DVPT employs an fully automatic prompting paradigm and assembles both image-specific local and global guidance into SAM through two components: the <em>Local Feature Prompt Tuning (LFPT)</em> module, which enhances local information capture of detailed anatomical structures, and the <em>Global Guiding Prompt (GGP)</em> encoder, which mitigates noise interference and strengthens the identification of ambiguous boundaries within medical images. By integrating both local and global prompts within the mask decoder, the proposed DVPT yields superior segmentation accuracy. Experimental results across three medical image segmentation tasks consistently demonstrate that our method outperforms current state-of-the-art approaches. Our method significantly contributes to accurate and impactful computer-assisted diagnostics, promoting advancements in healthcare solutions. Our code can be available at <span><span>https://github.com/cuiruize/DVPT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102608"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S089561112500117X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents Dual Visual Prompt Tuning (DVPT), an innovative strategy to enhance the performance of the Segment Anything Model (SAM) for medical image segmentation. While SAM demonstrates robust generalization in natural image segmentation, its effectiveness in medical tasks is hindered by the distinct characteristics of medical targets, the presence of noise and artifacts, and insufficient task-specific data for fine-tuning. Moreover, the manual-prompting paradigm applied in SAM make it laborious when adapted to medical domain. To address these challenges, DVPT employs an fully automatic prompting paradigm and assembles both image-specific local and global guidance into SAM through two components: the Local Feature Prompt Tuning (LFPT) module, which enhances local information capture of detailed anatomical structures, and the Global Guiding Prompt (GGP) encoder, which mitigates noise interference and strengthens the identification of ambiguous boundaries within medical images. By integrating both local and global prompts within the mask decoder, the proposed DVPT yields superior segmentation accuracy. Experimental results across three medical image segmentation tasks consistently demonstrate that our method outperforms current state-of-the-art approaches. Our method significantly contributes to accurate and impactful computer-assisted diagnostics, promoting advancements in healthcare solutions. Our code can be available at https://github.com/cuiruize/DVPT.

查看原文本刊更多论文

基于双视觉提示调整的医学图像分割大视觉模型

本文提出了双视觉提示调整（Dual Visual Prompt Tuning， DVPT），这是一种提高医学图像分割中任意分割模型（SAM）性能的创新策略。虽然SAM在自然图像分割中表现出鲁棒的泛化，但其在医疗任务中的有效性受到医疗目标的鲜明特征、噪声和伪像的存在以及用于微调的特定任务数据不足的阻碍。此外，人工提示模式在应用于医学领域时存在一定的难度。为了应对这些挑战，DVPT采用了一种完全自动的提示范例，并通过两个组件将图像特定的局部和全局引导组装到SAM中：局部特征提示调整（LFPT）模块，它增强了详细解剖结构的局部信息捕获，以及全局引导提示（GGP）编码器，它减轻了噪声干扰并加强了医学图像中模糊边界的识别。通过在掩码解码器中集成本地和全局提示，所提出的DVPT产生了优越的分割精度。三个医学图像分割任务的实验结果一致表明，我们的方法优于当前最先进的方法。我们的方法显著有助于准确和有效的计算机辅助诊断，促进医疗保健解决方案的进步。我们的代码可以在https://github.com/cuiruize/DVPT上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computerized Medical Imaging and Graphics 医学-核医学

CiteScore

10.70

自引率

3.50%

发文量

审稿时长

26 days

期刊介绍： The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.