Enhancing Large Language Models with Retrieval-Augmented Generation: A Radiology-Specific Approach.

IF 13.2 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Radiology-Artificial Intelligence Pub Date : 2025-05-01 DOI:10.1148/ryai.240313

Dane A Weinert, Andreas M Rauschecker

{"title":"Enhancing Large Language Models with Retrieval-Augmented Generation: A Radiology-Specific Approach.","authors":"Dane A Weinert, Andreas M Rauschecker","doi":"10.1148/ryai.240313","DOIUrl":null,"url":null,"abstract":"Retrieval-augmented generation (RAG) is a strategy to improve the performance of large language models (LLMs) by providing an LLM with an updated corpus of knowledge that can be used for answer generation in real time. RAG may improve LLM performance and clinical applicability in radiology by providing citable, up-to-date information without requiring model fine-tuning. In this retrospective study, a radiology-specific RAG system was developed using a vector database of 3689 RadioGraphics articles published from January 1999 to December 2023. Performance of five LLMs with (RAG-Systems) and without RAG on a 192-question radiology examination was compared. RAG significantly improved examination scores for GPT-4 (OpenAI; 81.2% vs 75.5%, P = .04) and Command R+ (Cohere; 70.3% vs 62.0%, P = .02), but not for Claude Opus (Anthropic), Mixtral (Mistral AI), or Gemini 1.5 Pro (Google DeepMind). RAG-Systems performed significantly better than pure LLMs on a 24-question subset directly sourced from RadioGraphics (85% vs 76%, P = .03). RAG-Systems retrieved 21 of 24 (87.5%, P < .001) relevant RadioGraphics references cited in the examination's answer explanations and successfully cited them in 18 of 21 (85.7%, P < .001) outputs. The results suggest that RAG is a promising approach to enhance LLM capabilities for radiology knowledge tasks, providing transparent, domain-specific information retrieval. Keywords: Computer Applications-General (Informatics), Technology Assessment Supplemental material is available for this article. © RSNA, 2025 See also commentary by Mansuri and Gichoya in this issue.","PeriodicalId":29787,"journal":{"name":"Radiology-Artificial Intelligence","volume":" ","pages":"e240313"},"PeriodicalIF":13.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology-Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1148/ryai.240313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Retrieval-augmented generation (RAG) is a strategy to improve the performance of large language models (LLMs) by providing an LLM with an updated corpus of knowledge that can be used for answer generation in real time. RAG may improve LLM performance and clinical applicability in radiology by providing citable, up-to-date information without requiring model fine-tuning. In this retrospective study, a radiology-specific RAG system was developed using a vector database of 3689 RadioGraphics articles published from January 1999 to December 2023. Performance of five LLMs with (RAG-Systems) and without RAG on a 192-question radiology examination was compared. RAG significantly improved examination scores for GPT-4 (OpenAI; 81.2% vs 75.5%, P = .04) and Command R+ (Cohere; 70.3% vs 62.0%, P = .02), but not for Claude Opus (Anthropic), Mixtral (Mistral AI), or Gemini 1.5 Pro (Google DeepMind). RAG-Systems performed significantly better than pure LLMs on a 24-question subset directly sourced from RadioGraphics (85% vs 76%, P = .03). RAG-Systems retrieved 21 of 24 (87.5%, P < .001) relevant RadioGraphics references cited in the examination's answer explanations and successfully cited them in 18 of 21 (85.7%, P < .001) outputs. The results suggest that RAG is a promising approach to enhance LLM capabilities for radiology knowledge tasks, providing transparent, domain-specific information retrieval. Keywords: Computer Applications-General (Informatics), Technology Assessment Supplemental material is available for this article. © RSNA, 2025 See also commentary by Mansuri and Gichoya in this issue.

查看原文本刊更多论文

用检索增强生成增强大型语言模型：一种放射学专用方法。

“刚刚接受”的论文经过了全面的同行评审，并已被接受发表在《放射学：人工智能》杂志上。这篇文章将经过编辑，布局和校样审查，然后在其最终版本出版。请注意，在最终编辑文章的制作过程中，可能会发现可能影响内容的错误。检索增强生成（RAG）是一种提高大型语言模型（LLM）性能的策略，它为LLM提供可用于实时生成答案的更新知识语料库。RAG可以通过提供可引用的、最新的信息而不需要模型微调来提高LLM在放射学中的性能和临床适用性。在这项回顾性研究中，利用1999年1月至2023年12月发表的3,689篇放射学文章的矢量数据库开发了放射学特异性RAG。我们比较了5例有和没有RAG的LLMs在192个问题的放射学检查中的表现。RAG显著提高了GPT-4（81.2%对75.5%,P = .04）和Command R+（70.3%对62.0%,P = .02）的考试成绩，但对Claude Opus、Mixtral或Gemini 1.5 Pro没有显著提高。在直接来自RadioGraphics的24个问题子集上，RAG-System的表现明显优于纯LLMs（85%对76%，P = .03）。ragg系统检索了考试答案解释中引用的21/24 （87.5%,P < .001）篇相关放射学文献，并在18/21 （85.7%,P < .001）篇输出中成功引用了这些文献。结果表明，RAG是一种很有前途的方法，可以提高LLM在放射学知识任务中的能力，提供透明的、特定领域的信息检索。©RSNA, 2025年。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Radiology-Artificial Intelligence

CiteScore

16.20

自引率

1.00%

发文量

期刊介绍： Radiology: Artificial Intelligence is a bi-monthly publication that focuses on the emerging applications of machine learning and artificial intelligence in the field of imaging across various disciplines. This journal is available online and accepts multiple manuscript types, including Original Research, Technical Developments, Data Resources, Review articles, Editorials, Letters to the Editor and Replies, Special Reports, and AI in Brief.