Vision-language foundation models for medical imaging: a review of current practices and innovations.

IF 2.8 4区医学 Q2 ENGINEERING, BIOMEDICAL

Biomedical Engineering Letters Pub Date : 2025-06-06 eCollection Date: 2025-09-01 DOI:10.1007/s13534-025-00484-6

Ji Seung Ryu, Hyunyoung Kang, Yuseong Chu, Sejung Yang

{"title":"Vision-language foundation models for medical imaging: a review of current practices and innovations.","authors":"Ji Seung Ryu, Hyunyoung Kang, Yuseong Chu, Sejung Yang","doi":"10.1007/s13534-025-00484-6","DOIUrl":null,"url":null,"abstract":"Foundation models, including large language models and vision-language models (VLMs), have revolutionized artificial intelligence by enabling efficient, scalable, and multimodal learning across diverse applications. By leveraging advancements in self-supervised and semi-supervised learning, these models integrate computer vision and natural language processing to address complex tasks, such as disease classification, segmentation, cross-modal retrieval, and automated report generation. Their ability to pretrain on vast, uncurated datasets minimizes reliance on annotated data while improving generalization and adaptability for a wide range of downstream tasks. In the medical domain, foundation models address critical challenges by combining the information from various medical imaging modalities with textual data from radiology reports and clinical notes. This integration has enabled the development of tools that streamline diagnostic workflows, enhance accuracy (ACC), and enable robust decision-making. This review provides a systematic examination of the recent advancements in medical VLMs from 2022 to 2024, focusing on modality-specific approaches and tailored applications in medical imaging. The key contributions include the creation of a structured taxonomy to categorize existing models, an in-depth analysis of datasets essential for training and evaluation, and a review of practical applications. This review also addresses ongoing challenges and proposes future directions for enhancing the accessibility and impact of foundation models in healthcare.Supplementary information: The online version contains supplementary material available at 10.1007/s13534-025-00484-6.","PeriodicalId":46898,"journal":{"name":"Biomedical Engineering Letters","volume":"15 5","pages":"809-830"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411343/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Engineering Letters","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s13534-025-00484-6","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Foundation models, including large language models and vision-language models (VLMs), have revolutionized artificial intelligence by enabling efficient, scalable, and multimodal learning across diverse applications. By leveraging advancements in self-supervised and semi-supervised learning, these models integrate computer vision and natural language processing to address complex tasks, such as disease classification, segmentation, cross-modal retrieval, and automated report generation. Their ability to pretrain on vast, uncurated datasets minimizes reliance on annotated data while improving generalization and adaptability for a wide range of downstream tasks. In the medical domain, foundation models address critical challenges by combining the information from various medical imaging modalities with textual data from radiology reports and clinical notes. This integration has enabled the development of tools that streamline diagnostic workflows, enhance accuracy (ACC), and enable robust decision-making. This review provides a systematic examination of the recent advancements in medical VLMs from 2022 to 2024, focusing on modality-specific approaches and tailored applications in medical imaging. The key contributions include the creation of a structured taxonomy to categorize existing models, an in-depth analysis of datasets essential for training and evaluation, and a review of practical applications. This review also addresses ongoing challenges and proposes future directions for enhancing the accessibility and impact of foundation models in healthcare.

Supplementary information: The online version contains supplementary material available at 10.1007/s13534-025-00484-6.

Abstract Image

查看原文本刊更多论文

医学成像的视觉语言基础模型：当前实践和创新的回顾。

基础模型，包括大型语言模型和视觉语言模型（vlm），通过支持跨不同应用程序的高效、可扩展和多模式学习，已经彻底改变了人工智能。通过利用自我监督和半监督学习的进步，这些模型集成了计算机视觉和自然语言处理，以解决复杂的任务，如疾病分类、分割、跨模式检索和自动报告生成。它们在大量未经整理的数据集上进行预训练的能力最大限度地减少了对注释数据的依赖，同时提高了对广泛下游任务的泛化和适应性。在医学领域，基础模型通过将来自各种医学成像模式的信息与来自放射学报告和临床记录的文本数据相结合来解决关键挑战。这种集成使得开发工具能够简化诊断工作流程，提高准确性（ACC），并实现稳健的决策。本文综述了2022年至2024年医疗VLMs的最新进展，重点是针对特定模式的方法和医疗成像中的定制应用。主要贡献包括创建结构化分类法对现有模型进行分类，对训练和评估所必需的数据集进行深入分析，以及对实际应用的回顾。本综述还解决了当前的挑战，并提出了提高基础模型在医疗保健中的可及性和影响的未来方向。补充信息：在线版本包含补充资料，提供地址为10.1007/s13534-025-00484-6。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomedical Engineering Letters ENGINEERING, BIOMEDICAL-

CiteScore

6.80

自引率

0.00%

发文量

期刊介绍： Biomedical Engineering Letters (BMEL) aims to present the innovative experimental science and technological development in the biomedical field as well as clinical application of new development. The article must contain original biomedical engineering content, defined as development, theoretical analysis, and evaluation/validation of a new technique. BMEL publishes the following types of papers: original articles, review articles, editorials, and letters to the editor. All the papers are reviewed in single-blind fashion.