从特定任务到基础模型：医学视觉语言分析的范式转变

IF 12.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science Review Pub Date : 2025-09-26 DOI:10.1016/j.cosrev.2025.100831

Muhammad Umair Ali , Amad Zafar , Seonghan Kim , Kwang Su Kim , Seung Won Lee

{"title":"从特定任务到基础模型：医学视觉语言分析的范式转变","authors":"Muhammad Umair Ali , Amad Zafar , Seonghan Kim , Kwang Su Kim , Seung Won Lee","doi":"10.1016/j.cosrev.2025.100831","DOIUrl":null,"url":null,"abstract":"<div><div>Integrating vision-language models (VLMs) into medical imaging drives a paradigm shift from task-specific systems toward generalist foundation models (FMs) capable of zero-shot and few-shot reasoning across diverse clinical domains. This review presents a comprehensive model-centric taxonomy, categorizing over 135 studies into three key developmental stages: (1) task-specific VLMs, (2) modular/adapter-based/prompt-tuned VLMs, and (3) foundation models. We systematically assess each category regarding architectural innovations, learning paradigms, clinical applications, and evaluation metrics. Our analysis reveals that the recent advances in multimodal contrastive learning, prompt engineering, and scalable transformer-based architectures significantly enhance generalizability, data efficiency, and multimodal interpretability in medical AI. Furthermore, we synthesize bibliometric trends and delineate methodological transitions through a PRISMA-based systematic review. This review article concludes with a discussion on the challenges and provides a roadmap for developing clinically reliable, data-efficient, and versatile VLMs, highlighting their transformative potential for improving diagnostic accuracy, workflow automation, and decision support in healthcare.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"59 ","pages":"Article 100831"},"PeriodicalIF":12.7000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From task-specific to foundation models: A paradigm shift in medical vision-language analysis\",\"authors\":\"Muhammad Umair Ali , Amad Zafar , Seonghan Kim , Kwang Su Kim , Seung Won Lee\",\"doi\":\"10.1016/j.cosrev.2025.100831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Integrating vision-language models (VLMs) into medical imaging drives a paradigm shift from task-specific systems toward generalist foundation models (FMs) capable of zero-shot and few-shot reasoning across diverse clinical domains. This review presents a comprehensive model-centric taxonomy, categorizing over 135 studies into three key developmental stages: (1) task-specific VLMs, (2) modular/adapter-based/prompt-tuned VLMs, and (3) foundation models. We systematically assess each category regarding architectural innovations, learning paradigms, clinical applications, and evaluation metrics. Our analysis reveals that the recent advances in multimodal contrastive learning, prompt engineering, and scalable transformer-based architectures significantly enhance generalizability, data efficiency, and multimodal interpretability in medical AI. Furthermore, we synthesize bibliometric trends and delineate methodological transitions through a PRISMA-based systematic review. This review article concludes with a discussion on the challenges and provides a roadmap for developing clinically reliable, data-efficient, and versatile VLMs, highlighting their transformative potential for improving diagnostic accuracy, workflow automation, and decision support in healthcare.</div></div>\",\"PeriodicalId\":48633,\"journal\":{\"name\":\"Computer Science Review\",\"volume\":\"59 \",\"pages\":\"Article 100831\"},\"PeriodicalIF\":12.7000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574013725001078\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725001078","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

将视觉语言模型（VLMs）集成到医学成像中，推动了从特定任务系统向能够在不同临床领域进行零次和少次推理的通才基础模型（FMs）的范式转变。本文提出了一个全面的以模型为中心的分类法，将超过135项研究分为三个关键的发展阶段：(1)特定任务的vlm，(2)模块化/基于适配器/即时调整的vlm，以及(3)基础模型。我们系统地评估了关于建筑创新、学习范例、临床应用和评估指标的每一个类别。我们的分析表明，多模态对比学习、快速工程和基于可扩展变压器的架构的最新进展显著提高了医疗人工智能的通用性、数据效率和多模态可解释性。此外，我们综合了文献计量学趋势，并通过基于prisma的系统综述描述了方法的转变。这篇综述文章最后讨论了这些挑战，并为开发临床可靠、数据高效和通用的vlm提供了路线图，强调了它们在提高医疗保健中的诊断准确性、工作流自动化和决策支持方面的变革潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From task-specific to foundation models: A paradigm shift in medical vision-language analysis

Integrating vision-language models (VLMs) into medical imaging drives a paradigm shift from task-specific systems toward generalist foundation models (FMs) capable of zero-shot and few-shot reasoning across diverse clinical domains. This review presents a comprehensive model-centric taxonomy, categorizing over 135 studies into three key developmental stages: (1) task-specific VLMs, (2) modular/adapter-based/prompt-tuned VLMs, and (3) foundation models. We systematically assess each category regarding architectural innovations, learning paradigms, clinical applications, and evaluation metrics. Our analysis reveals that the recent advances in multimodal contrastive learning, prompt engineering, and scalable transformer-based architectures significantly enhance generalizability, data efficiency, and multimodal interpretability in medical AI. Furthermore, we synthesize bibliometric trends and delineate methodological transitions through a PRISMA-based systematic review. This review article concludes with a discussion on the challenges and provides a roadmap for developing clinically reliable, data-efficient, and versatile VLMs, highlighting their transformative potential for improving diagnostic accuracy, workflow automation, and decision support in healthcare.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Science Review Computer Science-General Computer Science

CiteScore

32.70

自引率

0.00%

发文量

审稿时长

51 days

期刊介绍： Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.