从特定任务到基础模型:医学视觉语言分析的范式转变

IF 12.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Muhammad Umair Ali , Amad Zafar , Seonghan Kim , Kwang Su Kim , Seung Won Lee
{"title":"从特定任务到基础模型:医学视觉语言分析的范式转变","authors":"Muhammad Umair Ali ,&nbsp;Amad Zafar ,&nbsp;Seonghan Kim ,&nbsp;Kwang Su Kim ,&nbsp;Seung Won Lee","doi":"10.1016/j.cosrev.2025.100831","DOIUrl":null,"url":null,"abstract":"<div><div>Integrating vision-language models (VLMs) into medical imaging drives a paradigm shift from task-specific systems toward generalist foundation models (FMs) capable of zero-shot and few-shot reasoning across diverse clinical domains. This review presents a comprehensive model-centric taxonomy, categorizing over 135 studies into three key developmental stages: (1) task-specific VLMs, (2) modular/adapter-based/prompt-tuned VLMs, and (3) foundation models. We systematically assess each category regarding architectural innovations, learning paradigms, clinical applications, and evaluation metrics. Our analysis reveals that the recent advances in multimodal contrastive learning, prompt engineering, and scalable transformer-based architectures significantly enhance generalizability, data efficiency, and multimodal interpretability in medical AI. Furthermore, we synthesize bibliometric trends and delineate methodological transitions through a PRISMA-based systematic review. This review article concludes with a discussion on the challenges and provides a roadmap for developing clinically reliable, data-efficient, and versatile VLMs, highlighting their transformative potential for improving diagnostic accuracy, workflow automation, and decision support in healthcare.</div></div>","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"59 ","pages":"Article 100831"},"PeriodicalIF":12.7000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From task-specific to foundation models: A paradigm shift in medical vision-language analysis\",\"authors\":\"Muhammad Umair Ali ,&nbsp;Amad Zafar ,&nbsp;Seonghan Kim ,&nbsp;Kwang Su Kim ,&nbsp;Seung Won Lee\",\"doi\":\"10.1016/j.cosrev.2025.100831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Integrating vision-language models (VLMs) into medical imaging drives a paradigm shift from task-specific systems toward generalist foundation models (FMs) capable of zero-shot and few-shot reasoning across diverse clinical domains. This review presents a comprehensive model-centric taxonomy, categorizing over 135 studies into three key developmental stages: (1) task-specific VLMs, (2) modular/adapter-based/prompt-tuned VLMs, and (3) foundation models. We systematically assess each category regarding architectural innovations, learning paradigms, clinical applications, and evaluation metrics. Our analysis reveals that the recent advances in multimodal contrastive learning, prompt engineering, and scalable transformer-based architectures significantly enhance generalizability, data efficiency, and multimodal interpretability in medical AI. Furthermore, we synthesize bibliometric trends and delineate methodological transitions through a PRISMA-based systematic review. This review article concludes with a discussion on the challenges and provides a roadmap for developing clinically reliable, data-efficient, and versatile VLMs, highlighting their transformative potential for improving diagnostic accuracy, workflow automation, and decision support in healthcare.</div></div>\",\"PeriodicalId\":48633,\"journal\":{\"name\":\"Computer Science Review\",\"volume\":\"59 \",\"pages\":\"Article 100831\"},\"PeriodicalIF\":12.7000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Science Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1574013725001078\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574013725001078","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

将视觉语言模型(VLMs)集成到医学成像中,推动了从特定任务系统向能够在不同临床领域进行零次和少次推理的通才基础模型(FMs)的范式转变。本文提出了一个全面的以模型为中心的分类法,将超过135项研究分为三个关键的发展阶段:(1)特定任务的vlm,(2)模块化/基于适配器/即时调整的vlm,以及(3)基础模型。我们系统地评估了关于建筑创新、学习范例、临床应用和评估指标的每一个类别。我们的分析表明,多模态对比学习、快速工程和基于可扩展变压器的架构的最新进展显著提高了医疗人工智能的通用性、数据效率和多模态可解释性。此外,我们综合了文献计量学趋势,并通过基于prisma的系统综述描述了方法的转变。这篇综述文章最后讨论了这些挑战,并为开发临床可靠、数据高效和通用的vlm提供了路线图,强调了它们在提高医疗保健中的诊断准确性、工作流自动化和决策支持方面的变革潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
From task-specific to foundation models: A paradigm shift in medical vision-language analysis
Integrating vision-language models (VLMs) into medical imaging drives a paradigm shift from task-specific systems toward generalist foundation models (FMs) capable of zero-shot and few-shot reasoning across diverse clinical domains. This review presents a comprehensive model-centric taxonomy, categorizing over 135 studies into three key developmental stages: (1) task-specific VLMs, (2) modular/adapter-based/prompt-tuned VLMs, and (3) foundation models. We systematically assess each category regarding architectural innovations, learning paradigms, clinical applications, and evaluation metrics. Our analysis reveals that the recent advances in multimodal contrastive learning, prompt engineering, and scalable transformer-based architectures significantly enhance generalizability, data efficiency, and multimodal interpretability in medical AI. Furthermore, we synthesize bibliometric trends and delineate methodological transitions through a PRISMA-based systematic review. This review article concludes with a discussion on the challenges and provides a roadmap for developing clinically reliable, data-efficient, and versatile VLMs, highlighting their transformative potential for improving diagnostic accuracy, workflow automation, and decision support in healthcare.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Science Review
Computer Science Review Computer Science-General Computer Science
CiteScore
32.70
自引率
0.00%
发文量
26
审稿时长
51 days
期刊介绍: Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信