An orchestration learning framework for ultrasound imaging: Prompt-Guided Hyper-Perception and Attention-Matching Downstream Synchronization

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-05-27 DOI:10.1016/j.media.2025.103639

Zehui Lin , Shuo Li , Shanshan Wang , Zhifan Gao , Yue Sun , Chan-Tong Lam , Xindi Hu , Xin Yang , Dong Ni , Tao Tan

{"title":"An orchestration learning framework for ultrasound imaging: Prompt-Guided Hyper-Perception and Attention-Matching Downstream Synchronization","authors":"Zehui Lin , Shuo Li , Shanshan Wang , Zhifan Gao , Yue Sun , Chan-Tong Lam , Xindi Hu , Xin Yang , Dong Ni , Tao Tan","doi":"10.1016/j.media.2025.103639","DOIUrl":null,"url":null,"abstract":"<div><div>Ultrasound imaging is pivotal in clinical diagnostics due to its affordability, portability, safety, real-time capability, and non-invasive nature. It is widely utilized for examining various organs, such as the breast, thyroid, ovary, cardiac, and more. However, the manual interpretation and annotation of ultrasound images are time-consuming and prone to variability among physicians. While single-task artificial intelligence (AI) solutions have been explored, they are not ideal for scaling AI applications in medical imaging. Foundation models, although a trending solution, often struggle with real-world medical datasets due to factors such as noise, variability, and the incapability of flexibly aligning prior knowledge with task adaptation. To address these limitations, we propose an orchestration learning framework named PerceptGuide for general-purpose ultrasound classification and segmentation. Our framework incorporates a novel orchestration mechanism based on prompted hyper-perception, which adapts to the diverse inductive biases required by different ultrasound datasets. Unlike self-supervised pre-trained models, which require extensive fine-tuning, our approach leverages supervised pre-training to directly capture task-relevant features, providing a stronger foundation for multi-task and multi-organ ultrasound imaging. To support this research, we compiled a large-scale Multi-task, Multi-organ public ultrasound dataset (M<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>-US), featuring images from 9 organs and 16 datasets, encompassing both classification and segmentation tasks. Our approach employs four specific prompts—Object, Task, Input, and Position—to guide the model, ensuring task-specific adaptability. Additionally, a downstream synchronization training stage is introduced to fine-tune the model for new data, significantly improving generalization capabilities and enabling real-world applications. Experimental results demonstrate the robustness and versatility of our framework in handling multi-task and multi-organ ultrasound image processing, outperforming both specialist models and existing general AI solutions. Compared to specialist models, our method improves segmentation from 82.26% to 86.45%, classification from 71.30% to 79.08%, while also significantly reducing model parameters.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"104 ","pages":"Article 103639"},"PeriodicalIF":10.7000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525001860","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Ultrasound imaging is pivotal in clinical diagnostics due to its affordability, portability, safety, real-time capability, and non-invasive nature. It is widely utilized for examining various organs, such as the breast, thyroid, ovary, cardiac, and more. However, the manual interpretation and annotation of ultrasound images are time-consuming and prone to variability among physicians. While single-task artificial intelligence (AI) solutions have been explored, they are not ideal for scaling AI applications in medical imaging. Foundation models, although a trending solution, often struggle with real-world medical datasets due to factors such as noise, variability, and the incapability of flexibly aligning prior knowledge with task adaptation. To address these limitations, we propose an orchestration learning framework named PerceptGuide for general-purpose ultrasound classification and segmentation. Our framework incorporates a novel orchestration mechanism based on prompted hyper-perception, which adapts to the diverse inductive biases required by different ultrasound datasets. Unlike self-supervised pre-trained models, which require extensive fine-tuning, our approach leverages supervised pre-training to directly capture task-relevant features, providing a stronger foundation for multi-task and multi-organ ultrasound imaging. To support this research, we compiled a large-scale Multi-task, Multi-organ public ultrasound dataset (M

^{2}

-US), featuring images from 9 organs and 16 datasets, encompassing both classification and segmentation tasks. Our approach employs four specific prompts—Object, Task, Input, and Position—to guide the model, ensuring task-specific adaptability. Additionally, a downstream synchronization training stage is introduced to fine-tune the model for new data, significantly improving generalization capabilities and enabling real-world applications. Experimental results demonstrate the robustness and versatility of our framework in handling multi-task and multi-organ ultrasound image processing, outperforming both specialist models and existing general AI solutions. Compared to specialist models, our method improves segmentation from 82.26% to 86.45%, classification from 71.30% to 79.08%, while also significantly reducing model parameters.

Abstract Image

查看原文本刊更多论文

超声成像的编排学习框架：快速引导的超感知和注意匹配下游同步

超声成像由于其可负担性、便携性、安全性、实时性和非侵入性，在临床诊断中起着关键作用。它被广泛用于检查各种器官，如乳房、甲状腺、卵巢、心脏等。然而，超声图像的人工解释和注释是耗时的，并且容易在医生之间发生变化。虽然已经探索了单任务人工智能（AI）解决方案，但它们并不适合扩展AI在医学成像中的应用。基础模型虽然是一种趋势解决方案，但由于噪声、可变性以及无法灵活地将先验知识与任务适应相结合等因素，常常难以处理现实世界的医疗数据集。为了解决这些限制，我们提出了一个名为PerceptGuide的编排学习框架，用于通用超声分类和分割。我们的框架结合了一种基于提示超感知的新型编排机制，该机制适应不同超声数据集所需的不同归纳偏差。与需要大量微调的自我监督预训练模型不同，我们的方法利用监督预训练直接捕获任务相关特征，为多任务和多器官超声成像提供更强大的基础。为了支持这项研究，我们编制了一个大规模的多任务，多器官公共超声数据集（M2-US），包含来自9个器官和16个数据集的图像，包括分类和分割任务。我们的方法使用四个特定的提示—对象、任务、输入和位置—来指导模型，确保特定于任务的适应性。此外，还引入了下游同步训练阶段，以针对新数据对模型进行微调，从而显著提高泛化能力并实现实际应用。实验结果表明，我们的框架在处理多任务和多器官超声图像处理方面具有鲁棒性和通用性，优于专业模型和现有的通用人工智能解决方案。与专家模型相比，我们的方法将分割率从82.26%提高到86.45%，分类率从71.30%提高到79.08%，同时显著降低了模型参数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.