Lara Noelle Reiner, Moudather Chelbi, Leonard Fetscher, Juliane C Stöckel, Christoph Csapó-Schmidt, Shakhnaz Guseynova, Fares Al Mohamad, Keno Kyrill Bressem, Jawed Nawabi, Eberhard Siebert, Mike P Wattjes, Michael Scheel, Aymen Meddeb
{"title":"Automated MRI protocoling in neuroradiology in the era of large language models.","authors":"Lara Noelle Reiner, Moudather Chelbi, Leonard Fetscher, Juliane C Stöckel, Christoph Csapó-Schmidt, Shakhnaz Guseynova, Fares Al Mohamad, Keno Kyrill Bressem, Jawed Nawabi, Eberhard Siebert, Mike P Wattjes, Michael Scheel, Aymen Meddeb","doi":"10.1007/s11547-025-02040-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study investigates the automation of MRI protocoling, a routine task in radiology, using large language models (LLMs), comparing an open-source (LLama 3.1 405B) and a proprietary model (GPT-4o) with and without retrieval-augmented generation (RAG), a method for incorporating domain-specific knowledge.</p><p><strong>Material and methods: </strong>This retrospective study included MRI studies conducted between January and December 2023, along with institution-specific protocol assignment guidelines. Clinical questions were extracted, and a neuroradiologist established the gold standard protocol. LLMs were tasked with assigning MRI protocols and contrast medium administration with and without RAG. The results were compared to protocols selected by four radiologists. Token-based symmetric accuracy, the Wilcoxon signed-rank test, and the McNemar test were used for evaluation.</p><p><strong>Results: </strong>Data from 100 neuroradiology reports (mean age = 54.2 years ± 18.41, women 50%) were included. RAG integration significantly improved accuracy in sequence and contrast media prediction for LLama 3.1 (Sequences: 38% vs. 70%, P < .001, Contrast Media: 77% vs. 94%, P < .001), and GPT-4o (Sequences: 43% vs. 81%, P < .001, Contrast Media: 79% vs. 92%, P = .006). GPT-4o outperformed LLama 3.1 in MRI sequence prediction (81% vs. 70%, P < .001), with comparable accuracies to the radiologists (81% ± 0.21, P = .43). Both models equaled radiologists in predicting contrast media administration (LLama 3.1 RAG: 94% vs. 91% ± 0.2, P = .37, GPT-4o RAG: 92% vs. 91% ± 0.24, P = .48).</p><p><strong>Conclusion: </strong>Large language models show great potential as decision-support tools for MRI protocoling, with performance similar to radiologists. RAG enhances the ability of LLMs to provide accurate, institution-specific protocol recommendations.</p>","PeriodicalId":20817,"journal":{"name":"Radiologia Medica","volume":" ","pages":""},"PeriodicalIF":9.7000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiologia Medica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11547-025-02040-9","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: This study investigates the automation of MRI protocoling, a routine task in radiology, using large language models (LLMs), comparing an open-source (LLama 3.1 405B) and a proprietary model (GPT-4o) with and without retrieval-augmented generation (RAG), a method for incorporating domain-specific knowledge.
Material and methods: This retrospective study included MRI studies conducted between January and December 2023, along with institution-specific protocol assignment guidelines. Clinical questions were extracted, and a neuroradiologist established the gold standard protocol. LLMs were tasked with assigning MRI protocols and contrast medium administration with and without RAG. The results were compared to protocols selected by four radiologists. Token-based symmetric accuracy, the Wilcoxon signed-rank test, and the McNemar test were used for evaluation.
Results: Data from 100 neuroradiology reports (mean age = 54.2 years ± 18.41, women 50%) were included. RAG integration significantly improved accuracy in sequence and contrast media prediction for LLama 3.1 (Sequences: 38% vs. 70%, P < .001, Contrast Media: 77% vs. 94%, P < .001), and GPT-4o (Sequences: 43% vs. 81%, P < .001, Contrast Media: 79% vs. 92%, P = .006). GPT-4o outperformed LLama 3.1 in MRI sequence prediction (81% vs. 70%, P < .001), with comparable accuracies to the radiologists (81% ± 0.21, P = .43). Both models equaled radiologists in predicting contrast media administration (LLama 3.1 RAG: 94% vs. 91% ± 0.2, P = .37, GPT-4o RAG: 92% vs. 91% ± 0.24, P = .48).
Conclusion: Large language models show great potential as decision-support tools for MRI protocoling, with performance similar to radiologists. RAG enhances the ability of LLMs to provide accurate, institution-specific protocol recommendations.
期刊介绍:
Felice Perussia founded La radiologia medica in 1914. It is a peer-reviewed journal and serves as the official journal of the Italian Society of Medical and Interventional Radiology (SIRM). The primary purpose of the journal is to disseminate information related to Radiology, especially advancements in diagnostic imaging and related disciplines. La radiologia medica welcomes original research on both fundamental and clinical aspects of modern radiology, with a particular focus on diagnostic and interventional imaging techniques. It also covers topics such as radiotherapy, nuclear medicine, radiobiology, health physics, and artificial intelligence in the context of clinical implications. The journal includes various types of contributions such as original articles, review articles, editorials, short reports, and letters to the editor. With an esteemed Editorial Board and a selection of insightful reports, the journal is an indispensable resource for radiologists and professionals in related fields. Ultimately, La radiologia medica aims to serve as a platform for international collaboration and knowledge sharing within the radiological community.