基础DINO-US-SAM:文本提示超声多器官分割与lora调谐视觉语言模型。

IF 3.7 2区 工程技术 Q1 ACOUSTICS
Hamza Rasaee;Taha Koleilat;Hassan Rivaz
{"title":"基础DINO-US-SAM:文本提示超声多器官分割与lora调谐视觉语言模型。","authors":"Hamza Rasaee;Taha Koleilat;Hassan Rivaz","doi":"10.1109/TUFFC.2025.3605285","DOIUrl":null,"url":null,"abstract":"Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision–language model (VLM) that integrates grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of grounding DINO using low-rank adaptation (LoRA) to the ultrasound domain, and three were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) segmentation methods, including UniverSeg, MedSAM, MedCLIP-segment anything model (SAM), BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.","PeriodicalId":13322,"journal":{"name":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","volume":"72 10","pages":"1414-1425"},"PeriodicalIF":3.7000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Grounding DINO-US-SAM: Text-Prompted Multiorgan Segmentation in Ultrasound With LoRA-Tuned Vision–Language Models\",\"authors\":\"Hamza Rasaee;Taha Koleilat;Hassan Rivaz\",\"doi\":\"10.1109/TUFFC.2025.3605285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision–language model (VLM) that integrates grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of grounding DINO using low-rank adaptation (LoRA) to the ultrasound domain, and three were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) segmentation methods, including UniverSeg, MedSAM, MedCLIP-segment anything model (SAM), BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.\",\"PeriodicalId\":13322,\"journal\":{\"name\":\"IEEE transactions on ultrasonics, ferroelectrics, and frequency control\",\"volume\":\"72 10\",\"pages\":\"1414-1425\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on ultrasonics, ferroelectrics, and frequency control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11146904/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11146904/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

摘要

由于解剖学的可变性、不同的成像方案和有限的注释数据,超声成像中准确和通用的目标分割仍然是一个重大的挑战。在这项研究中,我们提出了一个即时驱动的视觉语言模型(VLM),该模型集成了接地DINO和SAM2,以实现跨多个超声器官的目标分割。总共使用了18个公共超声数据集,包括乳房、甲状腺、肝脏、前列腺、肾脏和棘旁肌。这些数据集被分为15个,用于使用低秩自适应(LoRA)对超声域进行微调和验证接地DINO,另外3个数据集完全用于测试,以评估未见分布的性能。综合实验表明,我们的方法在大多数可见数据集上优于最先进的分割方法,包括UniverSeg, MedSAM, MedCLIP-SAM, BiomedParse和SAMUS,同时在不可见数据集上保持强大的性能,而无需额外的微调。这些结果强调了VLMs在可扩展和鲁棒超声图像分析方面的前景,减少了对大型器官特异性注释数据集的依赖。我们将在验收后在code.sonography.ai上发布我们的代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Grounding DINO-US-SAM: Text-Prompted Multiorgan Segmentation in Ultrasound With LoRA-Tuned Vision–Language Models
Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision–language model (VLM) that integrates grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of grounding DINO using low-rank adaptation (LoRA) to the ultrasound domain, and three were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) segmentation methods, including UniverSeg, MedSAM, MedCLIP-segment anything model (SAM), BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
16.70%
发文量
583
审稿时长
4.5 months
期刊介绍: IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control includes the theory, technology, materials, and applications relating to: (1) the generation, transmission, and detection of ultrasonic waves and related phenomena; (2) medical ultrasound, including hyperthermia, bioeffects, tissue characterization and imaging; (3) ferroelectric, piezoelectric, and piezomagnetic materials, including crystals, polycrystalline solids, films, polymers, and composites; (4) frequency control, timing and time distribution, including crystal oscillators and other means of classical frequency control, and atomic, molecular and laser frequency control standards. Areas of interest range from fundamental studies to the design and/or applications of devices and systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信