{"title":"Grounding DINO-US-SAM: Text-Prompted Multiorgan Segmentation in Ultrasound With LoRA-Tuned Vision–Language Models","authors":"Hamza Rasaee;Taha Koleilat;Hassan Rivaz","doi":"10.1109/TUFFC.2025.3605285","DOIUrl":null,"url":null,"abstract":"Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision–language model (VLM) that integrates grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of grounding DINO using low-rank adaptation (LoRA) to the ultrasound domain, and three were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) segmentation methods, including UniverSeg, MedSAM, MedCLIP-segment anything model (SAM), BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.","PeriodicalId":13322,"journal":{"name":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","volume":"72 10","pages":"1414-1425"},"PeriodicalIF":3.7000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11146904/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate and generalizable object segmentation in ultrasound imaging remains a significant challenge due to anatomical variability, diverse imaging protocols, and limited annotated data. In this study, we propose a prompt-driven vision–language model (VLM) that integrates grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs. A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized. These datasets were divided into 15 for fine-tuning and validation of grounding DINO using low-rank adaptation (LoRA) to the ultrasound domain, and three were held out entirely for testing to evaluate performance in unseen distributions. Comprehensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) segmentation methods, including UniverSeg, MedSAM, MedCLIP-segment anything model (SAM), BiomedParse, and SAMUS on most seen datasets while maintaining strong performance on unseen datasets without additional fine-tuning. These results underscore the promise of VLMs in scalable and robust ultrasound image analysis, reducing dependence on large, organ-specific annotated datasets. We will publish our code on code.sonography.ai after acceptance.
期刊介绍:
IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control includes the theory, technology, materials, and applications relating to: (1) the generation, transmission, and detection of ultrasonic waves and related phenomena; (2) medical ultrasound, including hyperthermia, bioeffects, tissue characterization and imaging; (3) ferroelectric, piezoelectric, and piezomagnetic materials, including crystals, polycrystalline solids, films, polymers, and composites; (4) frequency control, timing and time distribution, including crystal oscillators and other means of classical frequency control, and atomic, molecular and laser frequency control standards. Areas of interest range from fundamental studies to the design and/or applications of devices and systems.