{"title":"SIT-SAM:一种将任意片段模型应用于零镜头医学图像语义分割的语义集成转换器","authors":"Wentao Shi , Junjun He , Yiqing Shen","doi":"10.1016/j.bspc.2025.108086","DOIUrl":null,"url":null,"abstract":"<div><div>Segment Anything Model (SAM) demonstrates zero-shot instance segmentation capabilities through prompt-guided interaction. However, its application to 3D medical imaging remains limited due to insufficient semantic understanding of complex anatomical structures. Current SAM variants attempt to address this challenge through architectural modifications and fine-tuning ; however, these approaches often compromise SAM’s original zero-shot capabilities. To bridge this gap, we introduce the semantic integration Transformer for SAM (SIT-SAM), an innovative post-processing framework that enhances SAM’s instance-level masks with semantic comprehension of anatomical structures. Our approach preserves SAM’s valuable zero-shot capabilities while introducing semantic awareness. Specifically, SIT-SAM comprises of three functional blocks: (1) the original SAM for instance mask generation, (2) a semantic integration transformer that combines hierarchical multi-scale feature extraction to capture both fine anatomical details and global context while leveraging instance mask geometry for enhanced anatomical structure understanding, (3) a cognitive science-inspired memory module for learning from limited training data. Evaluation on the TotalSegmentator dataset demonstrates SIT-SAM’s superior performance, achieving 90.55% accuracy, substantially outperforming the fine-tuned baseline <em>i</em>.<em>e</em>. SAM-Med3D with fully convolutional network (FCN) prediction head by 52.69%. SIT-SAM also exhibits robustness in data-constrained environments, delivering a 2.43% improvement with single-point prompt and maintaining effectiveness with multiple prompts, showing a 0.78% gain using ten point prompts. Code is available at <span><span>https://github.com/wentao0429/SIT-SAM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"110 ","pages":"Article 108086"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SIT-SAM: A semantic-integration transformer that adapts the Segment Anything Model to zero-shot medical image semantic segmentation\",\"authors\":\"Wentao Shi , Junjun He , Yiqing Shen\",\"doi\":\"10.1016/j.bspc.2025.108086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Segment Anything Model (SAM) demonstrates zero-shot instance segmentation capabilities through prompt-guided interaction. However, its application to 3D medical imaging remains limited due to insufficient semantic understanding of complex anatomical structures. Current SAM variants attempt to address this challenge through architectural modifications and fine-tuning ; however, these approaches often compromise SAM’s original zero-shot capabilities. To bridge this gap, we introduce the semantic integration Transformer for SAM (SIT-SAM), an innovative post-processing framework that enhances SAM’s instance-level masks with semantic comprehension of anatomical structures. Our approach preserves SAM’s valuable zero-shot capabilities while introducing semantic awareness. Specifically, SIT-SAM comprises of three functional blocks: (1) the original SAM for instance mask generation, (2) a semantic integration transformer that combines hierarchical multi-scale feature extraction to capture both fine anatomical details and global context while leveraging instance mask geometry for enhanced anatomical structure understanding, (3) a cognitive science-inspired memory module for learning from limited training data. Evaluation on the TotalSegmentator dataset demonstrates SIT-SAM’s superior performance, achieving 90.55% accuracy, substantially outperforming the fine-tuned baseline <em>i</em>.<em>e</em>. SAM-Med3D with fully convolutional network (FCN) prediction head by 52.69%. SIT-SAM also exhibits robustness in data-constrained environments, delivering a 2.43% improvement with single-point prompt and maintaining effectiveness with multiple prompts, showing a 0.78% gain using ten point prompts. Code is available at <span><span>https://github.com/wentao0429/SIT-SAM</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"110 \",\"pages\":\"Article 108086\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S174680942500597X\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S174680942500597X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
SIT-SAM: A semantic-integration transformer that adapts the Segment Anything Model to zero-shot medical image semantic segmentation
Segment Anything Model (SAM) demonstrates zero-shot instance segmentation capabilities through prompt-guided interaction. However, its application to 3D medical imaging remains limited due to insufficient semantic understanding of complex anatomical structures. Current SAM variants attempt to address this challenge through architectural modifications and fine-tuning ; however, these approaches often compromise SAM’s original zero-shot capabilities. To bridge this gap, we introduce the semantic integration Transformer for SAM (SIT-SAM), an innovative post-processing framework that enhances SAM’s instance-level masks with semantic comprehension of anatomical structures. Our approach preserves SAM’s valuable zero-shot capabilities while introducing semantic awareness. Specifically, SIT-SAM comprises of three functional blocks: (1) the original SAM for instance mask generation, (2) a semantic integration transformer that combines hierarchical multi-scale feature extraction to capture both fine anatomical details and global context while leveraging instance mask geometry for enhanced anatomical structure understanding, (3) a cognitive science-inspired memory module for learning from limited training data. Evaluation on the TotalSegmentator dataset demonstrates SIT-SAM’s superior performance, achieving 90.55% accuracy, substantially outperforming the fine-tuned baseline i.e. SAM-Med3D with fully convolutional network (FCN) prediction head by 52.69%. SIT-SAM also exhibits robustness in data-constrained environments, delivering a 2.43% improvement with single-point prompt and maintaining effectiveness with multiple prompts, showing a 0.78% gain using ten point prompts. Code is available at https://github.com/wentao0429/SIT-SAM.
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.