CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning.

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention Pub Date : 2024-10-01 Epub Date: 2024-10-23 DOI:10.1007/978-3-031-72390-2_44

Yuexi Du, Brian Chang, Nicha C Dvornek

{"title":"CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning.","authors":"Yuexi Du, Brian Chang, Nicha C Dvornek","doi":"10.1007/978-3-031-72390-2_44","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advancements in Contrastive Language-Image Pre-training (CLIP) [21] have demonstrated notable success in self-supervised representation learning across various tasks. However, the existing CLIP-like approaches often demand extensive GPU resources and prolonged training times due to the considerable size of the model and dataset, making them poor for medical applications, in which large datasets are not always common. Meanwhile, the language model prompts are mainly manually derived from labels tied to images, potentially overlooking the richness of information within training samples. We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT) that harnesses the strengths of the extensive pre-trained language and visual models. Furthermore, we present an efficient strategy for learning context-based prompts that mitigates the gap between informative clinical diagnostic data and simple class labels. Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets compared with various baselines. The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.</p>","PeriodicalId":94280,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"15012 ","pages":"465-475"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11709740/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-031-72390-2_44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/23 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in Contrastive Language-Image Pre-training (CLIP) [21] have demonstrated notable success in self-supervised representation learning across various tasks. However, the existing CLIP-like approaches often demand extensive GPU resources and prolonged training times due to the considerable size of the model and dataset, making them poor for medical applications, in which large datasets are not always common. Meanwhile, the language model prompts are mainly manually derived from labels tied to images, potentially overlooking the richness of information within training samples. We introduce a novel language-image Contrastive Learning method with an Efficient large language model and prompt Fine-Tuning (CLEFT) that harnesses the strengths of the extensive pre-trained language and visual models. Furthermore, we present an efficient strategy for learning context-based prompts that mitigates the gap between informative clinical diagnostic data and simple class labels. Our method demonstrates state-of-the-art performance on multiple chest X-ray and mammography datasets compared with various baselines. The proposed parameter efficient framework can reduce the total trainable model size by 39% and reduce the trainable language model to only 4% compared with the current BERT encoder.

查看原文本刊更多论文

基于高效大语言模型和快速微调的语言-图像对比学习。

对比语言图像预训练（CLIP）的最新进展已经在各种任务的自监督表示学习中取得了显著的成功。然而，由于模型和数据集的相当大的规模，现有的类似clip的方法通常需要大量的GPU资源和较长的训练时间，这使得它们不适合医疗应用，在医疗应用中，大型数据集并不总是常见的。同时，语言模型提示主要是手动从与图像绑定的标签中获得的，可能忽略了训练样本中信息的丰富性。我们介绍了一种新的语言-图像对比学习方法，该方法利用了广泛的预训练语言和视觉模型的优势，采用高效的大语言模型和快速微调（CLEFT）。此外，我们提出了一种有效的策略来学习基于上下文的提示，以减轻信息丰富的临床诊断数据和简单的类标签之间的差距。与各种基线相比，我们的方法在多个胸部x线和乳房x线摄影数据集上展示了最先进的性能。与现有的BERT编码器相比，所提出的参数高效框架可以将可训练模型的总大小减少39%，将可训练语言模型的大小减少到4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

自引率

0.00%

发文量