R2GenGPT: Radiology Report Generation with frozen LLMs

Zhanyu Wang , Lingqiao Liu , Lei Wang , Luping Zhou
{"title":"R2GenGPT: Radiology Report Generation with frozen LLMs","authors":"Zhanyu Wang ,&nbsp;Lingqiao Liu ,&nbsp;Lei Wang ,&nbsp;Luping Zhou","doi":"10.1016/j.metrad.2023.100033","DOIUrl":null,"url":null,"abstract":"<div><p>Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second, it exhibits high training efficiency, as it requires the training of an exceptionally minimal number of parameters while achieving rapid convergence. By employing delta tuning, our model only trains 5 ​M parameters (which constitute just 0.07 ​% of the total parameter count) to achieve performance close to the SOTA levels. Our code is available at <span>https://github.com/wang-zhanyu/R2GenGPT</span><svg><path></path></svg>.</p></div>","PeriodicalId":100921,"journal":{"name":"Meta-Radiology","volume":"1 3","pages":"Article 100033"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950162823000334/pdfft?md5=8d65f61005f1683dede680bdf5f173cd&pid=1-s2.0-S2950162823000334-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta-Radiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950162823000334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second, it exhibits high training efficiency, as it requires the training of an exceptionally minimal number of parameters while achieving rapid convergence. By employing delta tuning, our model only trains 5 ​M parameters (which constitute just 0.07 ​% of the total parameter count) to achieve performance close to the SOTA levels. Our code is available at https://github.com/wang-zhanyu/R2GenGPT.

Abstract Image

R2GenGPT:利用冷冻 LLM 生成放射学报告
大语言模型(LLMs)在应用于各种语言任务时,一直展现出非凡的泛化能力。然而,在放射报告生成(R2Gen)中充分发挥 LLM 的潜力仍然是一个挑战,这源于 LLM 与 R2Gen 任务之间固有的模式差异。为了有效弥合这一差距,我们提出了 R2GenGPT,这是一种新颖的解决方案,它利用高效的视觉对齐模块将视觉特征与 LLM 的词嵌入空间进行对齐。这种创新方法使以前静态的 LLM 能够无缝整合和处理图像信息,在优化 R2Gen 性能方面向前迈进了一步。R2GenGPT 具有以下优势。首先,它只训练轻量级视觉配准模块,同时冻结 LLM 的所有参数,从而达到最先进(SOTA)的性能。其次,它具有很高的训练效率,因为它只需要训练极少量的参数就能实现快速收敛。通过采用 delta 调整,我们的模型只需训练 5 M 个参数(仅占总参数数的 0.07%),就能达到接近 SOTA 水平的性能。我们的代码见 https://github.com/wang-zhanyu/R2GenGPT。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信