CLIP in medical imaging: A survey

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-03-22 DOI:10.1016/j.media.2025.103551

Zihao Zhao , Yuxiao Liu , Han Wu , Mei Wang , Yonghao Li , Sheng Wang , Lin Teng , Disheng Liu , Zhiming Cui , Qian Wang , Dinggang Shen

{"title":"CLIP in medical imaging: A survey","authors":"Zihao Zhao , Yuxiao Liu , Han Wu , Mei Wang , Yonghao Li , Sheng Wang , Lin Teng , Disheng Liu , Zhiming Cui , Qian Wang , Dinggang Shen","doi":"10.1016/j.media.2025.103551","DOIUrl":null,"url":null,"abstract":"<div><div>Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks due to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving as a pre-training paradigm for image–text alignment, or a critical component in diverse clinical tasks. With the aim of facilitating a deeper understanding of this promising direction, this survey offers an in-depth exploration of the CLIP within the domain of medical imaging, regarding both refined CLIP pre-training and CLIP-driven applications. In this paper, we (1) first start with a brief introduction to the fundamentals of CLIP methodology; (2) then investigate the adaptation of CLIP pre-training in the medical imaging domain, focusing on how to optimize CLIP given characteristics of medical images and reports; (3) further explore practical utilization of CLIP pre-trained models in various tasks, including classification, dense prediction, and cross-modal tasks; and (4) finally discuss existing limitations of CLIP in the context of medical imaging, and propose forward-looking directions to address the demands of medical imaging domain. Studies featuring technical and practical value are both investigated. We expect this survey will provide researchers with a holistic understanding of the CLIP paradigm and its potential implications. The project page of this survey can also be found on <span><span>Github</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"102 ","pages":"Article 103551"},"PeriodicalIF":10.7000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525000982","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks due to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving as a pre-training paradigm for image–text alignment, or a critical component in diverse clinical tasks. With the aim of facilitating a deeper understanding of this promising direction, this survey offers an in-depth exploration of the CLIP within the domain of medical imaging, regarding both refined CLIP pre-training and CLIP-driven applications. In this paper, we (1) first start with a brief introduction to the fundamentals of CLIP methodology; (2) then investigate the adaptation of CLIP pre-training in the medical imaging domain, focusing on how to optimize CLIP given characteristics of medical images and reports; (3) further explore practical utilization of CLIP pre-trained models in various tasks, including classification, dense prediction, and cross-modal tasks; and (4) finally discuss existing limitations of CLIP in the context of medical imaging, and propose forward-looking directions to address the demands of medical imaging domain. Studies featuring technical and practical value are both investigated. We expect this survey will provide researchers with a holistic understanding of the CLIP paradigm and its potential implications. The project page of this survey can also be found on Github.

查看原文本刊更多论文

CLIP在医学影像中的应用综述

对比语言图像预训练（CLIP）是一种简单而有效的预训练范式，它成功地将文本监督引入视觉模型。由于其通用性和可解释性，它在各种任务中显示出令人鼓舞的结果。CLIP的使用最近在医学成像领域获得了越来越多的兴趣，作为图像-文本对齐的预训练范例，或在各种临床任务中的关键组成部分。为了促进对这一有前途的方向的更深入的了解，本调查提供了医学成像领域内的CLIP的深入探索，包括改进的CLIP预训练和CLIP驱动的应用。在本文中，我们(1)首先简要介绍了CLIP方法的基本原理；(2)然后研究CLIP预训练在医学成像领域的适应性，重点研究如何在给定医学图像和报告特征的情况下优化CLIP；(3)进一步探索CLIP预训练模型在分类、密集预测、跨模态等任务中的实际应用；(4)最后讨论了CLIP在医学成像领域存在的局限性，并提出了解决医学成像领域需求的前瞻性方向。对具有技术和实用价值的研究进行了探讨。我们期望这项调查将为研究人员提供对CLIP范式及其潜在影响的整体理解。这个调查的项目页面也可以在Github上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.