R2GenGPT: Radiology Report Generation with frozen LLMs

Meta-Radiology Pub Date : 2023-11-01 DOI:10.1016/j.metrad.2023.100033

Zhanyu Wang , Lingqiao Liu , Lei Wang , Luping Zhou

{"title":"R2GenGPT: Radiology Report Generation with frozen LLMs","authors":"Zhanyu Wang , Lingqiao Liu , Lei Wang , Luping Zhou","doi":"10.1016/j.metrad.2023.100033","DOIUrl":null,"url":null,"abstract":"<div><p>Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second, it exhibits high training efficiency, as it requires the training of an exceptionally minimal number of parameters while achieving rapid convergence. By employing delta tuning, our model only trains 5 M parameters (which constitute just 0.07 % of the total parameter count) to achieve performance close to the SOTA levels. Our code is available at <span>https://github.com/wang-zhanyu/R2GenGPT</span><svg><path></path></svg>.</p></div>","PeriodicalId":100921,"journal":{"name":"Meta-Radiology","volume":"1 3","pages":"Article 100033"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950162823000334/pdfft?md5=8d65f61005f1683dede680bdf5f173cd&pid=1-s2.0-S2950162823000334-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta-Radiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950162823000334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second, it exhibits high training efficiency, as it requires the training of an exceptionally minimal number of parameters while achieving rapid convergence. By employing delta tuning, our model only trains 5 M parameters (which constitute just 0.07 % of the total parameter count) to achieve performance close to the SOTA levels. Our code is available at https://github.com/wang-zhanyu/R2GenGPT.

Abstract Image

查看原文本刊更多论文

R2GenGPT：利用冷冻 LLM 生成放射学报告

大语言模型（LLMs）在应用于各种语言任务时，一直展现出非凡的泛化能力。然而，在放射报告生成（R2Gen）中充分发挥 LLM 的潜力仍然是一个挑战，这源于 LLM 与 R2Gen 任务之间固有的模式差异。为了有效弥合这一差距，我们提出了 R2GenGPT，这是一种新颖的解决方案，它利用高效的视觉对齐模块将视觉特征与 LLM 的词嵌入空间进行对齐。这种创新方法使以前静态的 LLM 能够无缝整合和处理图像信息，在优化 R2Gen 性能方面向前迈进了一步。R2GenGPT 具有以下优势。首先，它只训练轻量级视觉配准模块，同时冻结 LLM 的所有参数，从而达到最先进（SOTA）的性能。其次，它具有很高的训练效率，因为它只需要训练极少量的参数就能实现快速收敛。通过采用 delta 调整，我们的模型只需训练 5 M 个参数（仅占总参数数的 0.07%），就能达到接近 SOTA 水平的性能。我们的代码见 https://github.com/wang-zhanyu/R2GenGPT。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Meta-Radiology

自引率

0.00%

发文量