基础DINO-US-SAM：文本提示超声多器官分割与lora调谐视觉语言模型。

IF 3.7 2区工程技术 Q1 ACOUSTICS

IEEE transactions on ultrasonics, ferroelectrics, and frequency control Pub Date : 2025-09-02 DOI:10.1109/TUFFC.2025.3605285

Hamza Rasaee;Taha Koleilat;Hassan Rivaz

{"title":"基础DINO-US-SAM：文本提示超声多器官分割与lora调谐视觉语言模型。","authors":"Hamza Rasaee;Taha Koleilat;Hassan Rivaz","doi":"10.1109/TUFFC.2025.3605285","DOIUrl":null,"url":null,"abstract":"Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision–language model (VLM) that integrates grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of grounding DINO using low-rank adaptation (LoRA) to the ultrasound domain, and three were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) segmentation methods, including UniverSeg, MedSAM, MedCLIP-segment anything model (SAM), BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.","PeriodicalId":13322,"journal":{"name":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","volume":"72 10","pages":"1414-1425"},"PeriodicalIF":3.7000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Grounding DINO-US-SAM: Text-Prompted Multiorgan Segmentation in Ultrasound With LoRA-Tuned Vision–Language Models\",\"authors\":\"Hamza Rasaee;Taha Koleilat;Hassan Rivaz\",\"doi\":\"10.1109/TUFFC.2025.3605285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision–language model (VLM) that integrates grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of grounding DINO using low-rank adaptation (LoRA) to the ultrasound domain, and three were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) segmentation methods, including UniverSeg, MedSAM, MedCLIP-segment anything model (SAM), BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.\",\"PeriodicalId\":13322,\"journal\":{\"name\":\"IEEE transactions on ultrasonics, ferroelectrics, and frequency control\",\"volume\":\"72 10\",\"pages\":\"1414-1425\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on ultrasonics, ferroelectrics, and frequency control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11146904/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11146904/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

由于解剖学的可变性、不同的成像方案和有限的注释数据，超声成像中准确和通用的目标分割仍然是一个重大的挑战。在这项研究中，我们提出了一个即时驱动的视觉语言模型（VLM），该模型集成了接地DINO和SAM2，以实现跨多个超声器官的目标分割。总共使用了18个公共超声数据集，包括乳房、甲状腺、肝脏、前列腺、肾脏和棘旁肌。这些数据集被分为15个，用于使用低秩自适应（LoRA）对超声域进行微调和验证接地DINO，另外3个数据集完全用于测试，以评估未见分布的性能。综合实验表明，我们的方法在大多数可见数据集上优于最先进的分割方法，包括UniverSeg， MedSAM, MedCLIP-SAM， BiomedParse和SAMUS，同时在不可见数据集上保持强大的性能，而无需额外的微调。这些结果强调了VLMs在可扩展和鲁棒超声图像分析方面的前景，减少了对大型器官特异性注释数据集的依赖。我们将在验收后在code.sonography.ai上发布我们的代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Grounding DINO-US-SAM: Text-Prompted Multiorgan Segmentation in Ultrasound With LoRA-Tuned Vision–Language Models

Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision–language model (VLM) that integrates grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of grounding DINO using low-rank adaptation (LoRA) to the ultrasound domain, and three were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) segmentation methods, including UniverSeg, MedSAM, MedCLIP-segment anything model (SAM), BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on ultrasonics, ferroelectrics, and frequency control 工程技术-工程：电子与电气

CiteScore

7.70

自引率

16.70%

发文量

583

审稿时长

4.5 months

期刊介绍： IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control includes the theory, technology, materials, and applications relating to: (1) the generation, transmission, and detection of ultrasonic waves and related phenomena; (2) medical ultrasound, including hyperthermia, bioeffects, tissue characterization and imaging; (3) ferroelectric, piezoelectric, and piezomagnetic materials, including crystals, polycrystalline solids, films, polymers, and composites; (4) frequency control, timing and time distribution, including crystal oscillators and other means of classical frequency control, and atomic, molecular and laser frequency control standards. Areas of interest range from fundamental studies to the design and/or applications of devices and systems.