SIT-SAM：一种将任意片段模型应用于零镜头医学图像语义分割的语义集成转换器

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL

Biomedical Signal Processing and Control Pub Date : 2025-06-08 DOI:10.1016/j.bspc.2025.108086

Wentao Shi , Junjun He , Yiqing Shen

{"title":"SIT-SAM：一种将任意片段模型应用于零镜头医学图像语义分割的语义集成转换器","authors":"Wentao Shi , Junjun He , Yiqing Shen","doi":"10.1016/j.bspc.2025.108086","DOIUrl":null,"url":null,"abstract":"<div><div>Segment Anything Model (SAM) demonstrates zero-shot instance segmentation capabilities through prompt-guided interaction. However, its application to 3D medical imaging remains limited due to insufficient semantic understanding of complex anatomical structures. Current SAM variants attempt to address this challenge through architectural modifications and fine-tuning ; however, these approaches often compromise SAM’s original zero-shot capabilities. To bridge this gap, we introduce the semantic integration Transformer for SAM (SIT-SAM), an innovative post-processing framework that enhances SAM’s instance-level masks with semantic comprehension of anatomical structures. Our approach preserves SAM’s valuable zero-shot capabilities while introducing semantic awareness. Specifically, SIT-SAM comprises of three functional blocks: (1) the original SAM for instance mask generation, (2) a semantic integration transformer that combines hierarchical multi-scale feature extraction to capture both fine anatomical details and global context while leveraging instance mask geometry for enhanced anatomical structure understanding, (3) a cognitive science-inspired memory module for learning from limited training data. Evaluation on the TotalSegmentator dataset demonstrates SIT-SAM’s superior performance, achieving 90.55% accuracy, substantially outperforming the fine-tuned baseline <em>i</em>.<em>e</em>. SAM-Med3D with fully convolutional network (FCN) prediction head by 52.69%. SIT-SAM also exhibits robustness in data-constrained environments, delivering a 2.43% improvement with single-point prompt and maintaining effectiveness with multiple prompts, showing a 0.78% gain using ten point prompts. Code is available at <span><span>https://github.com/wentao0429/SIT-SAM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"110 ","pages":"Article 108086"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SIT-SAM: A semantic-integration transformer that adapts the Segment Anything Model to zero-shot medical image semantic segmentation\",\"authors\":\"Wentao Shi , Junjun He , Yiqing Shen\",\"doi\":\"10.1016/j.bspc.2025.108086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Segment Anything Model (SAM) demonstrates zero-shot instance segmentation capabilities through prompt-guided interaction. However, its application to 3D medical imaging remains limited due to insufficient semantic understanding of complex anatomical structures. Current SAM variants attempt to address this challenge through architectural modifications and fine-tuning ; however, these approaches often compromise SAM’s original zero-shot capabilities. To bridge this gap, we introduce the semantic integration Transformer for SAM (SIT-SAM), an innovative post-processing framework that enhances SAM’s instance-level masks with semantic comprehension of anatomical structures. Our approach preserves SAM’s valuable zero-shot capabilities while introducing semantic awareness. Specifically, SIT-SAM comprises of three functional blocks: (1) the original SAM for instance mask generation, (2) a semantic integration transformer that combines hierarchical multi-scale feature extraction to capture both fine anatomical details and global context while leveraging instance mask geometry for enhanced anatomical structure understanding, (3) a cognitive science-inspired memory module for learning from limited training data. Evaluation on the TotalSegmentator dataset demonstrates SIT-SAM’s superior performance, achieving 90.55% accuracy, substantially outperforming the fine-tuned baseline <em>i</em>.<em>e</em>. SAM-Med3D with fully convolutional network (FCN) prediction head by 52.69%. SIT-SAM also exhibits robustness in data-constrained environments, delivering a 2.43% improvement with single-point prompt and maintaining effectiveness with multiple prompts, showing a 0.78% gain using ten point prompts. Code is available at <span><span>https://github.com/wentao0429/SIT-SAM</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"110 \",\"pages\":\"Article 108086\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S174680942500597X\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S174680942500597X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

分段任意模型（SAM）通过即时引导的交互演示零射击实例分割功能。然而，由于对复杂解剖结构的语义理解不足，其在三维医学成像中的应用仍然受到限制。当前的SAM变体试图通过架构修改和微调来解决这一挑战；然而，这些方法经常损害SAM的原始零射击能力。为了弥补这一差距，我们引入了SAM的语义集成转换器（SIT-SAM），这是一个创新的后处理框架，通过对解剖结构的语义理解来增强SAM的实例级掩模。我们的方法在引入语义感知的同时保留了SAM宝贵的零射击功能。具体而言，SIT-SAM包括三个功能模块：(1)原始SAM用于实例掩码生成；(2)语义集成转换器，结合分层多尺度特征提取，捕捉精细解剖细节和全局上下文，同时利用实例掩码几何增强解剖结构理解；(3)基于认知科学的记忆模块，用于从有限的训练数据中学习。对TotalSegmentator数据集的评估表明，SIT-SAM具有优异的性能，达到90.55%的准确率，大大优于微调基线，即具有全卷积网络（FCN）预测头的SAM-Med3D 52.69%。SIT-SAM在数据受限的环境中也表现出鲁棒性，使用单点提示时提高2.43%，使用多点提示时保持有效性，使用十点提示时提高0.78%。代码可从https://github.com/wentao0429/SIT-SAM获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SIT-SAM: A semantic-integration transformer that adapts the Segment Anything Model to zero-shot medical image semantic segmentation

Segment Anything Model (SAM) demonstrates zero-shot instance segmentation capabilities through prompt-guided interaction. However, its application to 3D medical imaging remains limited due to insufficient semantic understanding of complex anatomical structures. Current SAM variants attempt to address this challenge through architectural modifications and fine-tuning ; however, these approaches often compromise SAM’s original zero-shot capabilities. To bridge this gap, we introduce the semantic integration Transformer for SAM (SIT-SAM), an innovative post-processing framework that enhances SAM’s instance-level masks with semantic comprehension of anatomical structures. Our approach preserves SAM’s valuable zero-shot capabilities while introducing semantic awareness. Specifically, SIT-SAM comprises of three functional blocks: (1) the original SAM for instance mask generation, (2) a semantic integration transformer that combines hierarchical multi-scale feature extraction to capture both fine anatomical details and global context while leveraging instance mask geometry for enhanced anatomical structure understanding, (3) a cognitive science-inspired memory module for learning from limited training data. Evaluation on the TotalSegmentator dataset demonstrates SIT-SAM’s superior performance, achieving 90.55% accuracy, substantially outperforming the fine-tuned baseline i.e. SAM-Med3D with fully convolutional network (FCN) prediction head by 52.69%. SIT-SAM also exhibits robustness in data-constrained environments, delivering a 2.43% improvement with single-point prompt and maintaining effectiveness with multiple prompts, showing a 0.78% gain using ten point prompts. Code is available at https://github.com/wentao0429/SIT-SAM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical Signal Processing and Control 工程技术-工程：生物医学

CiteScore

9.80

自引率

13.70%

发文量

822

审稿时长

4 months

期刊介绍： Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management. Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.