Integrating Large language models into radiology workflow: Impact of generating personalized report templates from summary

IF 3.2 3区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Amit Gupta , Manzoor Hussain , Kondaveeti Nikhileshwar , Ashish Rastogi , Krithika Rangarajan
{"title":"Integrating Large language models into radiology workflow: Impact of generating personalized report templates from summary","authors":"Amit Gupta ,&nbsp;Manzoor Hussain ,&nbsp;Kondaveeti Nikhileshwar ,&nbsp;Ashish Rastogi ,&nbsp;Krithika Rangarajan","doi":"10.1016/j.ejrad.2025.112198","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To evaluate feasibility of large language models (LLMs) to convert radiologist-generated report summaries into personalized report templates, and assess its impact on scan reporting time and quality.</div></div><div><h3>Materials and Methods</h3><div>In this retrospective study, 100 CT scans from oncology patients were randomly divided into two equal sets. Two radiologists generated conventional reports for one set and summary reports for the other, and vice versa. Three LLMs − GPT-4, Google Gemini, and Claude Opus − generated complete reports from the summaries using institution-specific generic templates. Two expert radiologists qualitatively evaluated the radiologist summaries and LLM-generated reports using the ACR RADPEER scoring system, using conventional radiologist reports as reference. Reporting time for conventional versus summary-based reports was compared, and LLM-generated reports were analyzed for errors. Quantitative similarity and linguistic metrics were computed to assess report alignment across models with the original radiologist-generated report summaries. Statistical analyses were performed using Python 3.0 to identify significant differences in reporting times, error rates and quantitative metrics.</div></div><div><h3>Results</h3><div>The average reporting time was significantly shorter for summary method (6.76 min) compared to conventional method (8.95 min) (p &lt; 0.005). Among the 100 radiologist summaries, 10 received RADPEER scores worse than 1, with three deemed to have clinically significant discrepancies. Only one LLM-generated report received a worse RADPEER score than its corresponding summary. Error frequencies among LLM-generated reports showed no significant differences across models, with template-related errors being most common (χ<sup>2</sup> = 1.146, p = 0.564). Quantitative analysis indicated significant differences in similarity and linguistic metrics among the three LLMs (p &lt; 0.05), reflecting unique generation patterns.</div></div><div><h3>Conclusion</h3><div>Summary-based scan reporting along with use of LLMs to generate complete personalized report templates can shorten reporting time while maintaining the report quality. However, there remains a need for human oversight to address errors in the generated reports.</div></div><div><h3>Relevance Statement</h3><div>Summary-based reporting of radiological studies along with the use of large language models to generate tailored reports using generic templates has the potential to make the workflow more efficient by shortening the reporting time while maintaining the quality of reporting.</div></div>","PeriodicalId":12063,"journal":{"name":"European Journal of Radiology","volume":"189 ","pages":"Article 112198"},"PeriodicalIF":3.2000,"publicationDate":"2025-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0720048X25002840","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

To evaluate feasibility of large language models (LLMs) to convert radiologist-generated report summaries into personalized report templates, and assess its impact on scan reporting time and quality.

Materials and Methods

In this retrospective study, 100 CT scans from oncology patients were randomly divided into two equal sets. Two radiologists generated conventional reports for one set and summary reports for the other, and vice versa. Three LLMs − GPT-4, Google Gemini, and Claude Opus − generated complete reports from the summaries using institution-specific generic templates. Two expert radiologists qualitatively evaluated the radiologist summaries and LLM-generated reports using the ACR RADPEER scoring system, using conventional radiologist reports as reference. Reporting time for conventional versus summary-based reports was compared, and LLM-generated reports were analyzed for errors. Quantitative similarity and linguistic metrics were computed to assess report alignment across models with the original radiologist-generated report summaries. Statistical analyses were performed using Python 3.0 to identify significant differences in reporting times, error rates and quantitative metrics.

Results

The average reporting time was significantly shorter for summary method (6.76 min) compared to conventional method (8.95 min) (p < 0.005). Among the 100 radiologist summaries, 10 received RADPEER scores worse than 1, with three deemed to have clinically significant discrepancies. Only one LLM-generated report received a worse RADPEER score than its corresponding summary. Error frequencies among LLM-generated reports showed no significant differences across models, with template-related errors being most common (χ2 = 1.146, p = 0.564). Quantitative analysis indicated significant differences in similarity and linguistic metrics among the three LLMs (p < 0.05), reflecting unique generation patterns.

Conclusion

Summary-based scan reporting along with use of LLMs to generate complete personalized report templates can shorten reporting time while maintaining the report quality. However, there remains a need for human oversight to address errors in the generated reports.

Relevance Statement

Summary-based reporting of radiological studies along with the use of large language models to generate tailored reports using generic templates has the potential to make the workflow more efficient by shortening the reporting time while maintaining the quality of reporting.

Abstract Image

将大型语言模型集成到放射学工作流中:从摘要生成个性化报告模板的影响
目的评价大语言模型(llm)将放射科医师生成的报告摘要转化为个性化报告模板的可行性,并评估其对扫描报告时间和质量的影响。材料与方法回顾性研究100例肿瘤患者的CT扫描,随机分为两组。两名放射科医生为一组生成常规报告,为另一组生成摘要报告,反之亦然。三个llm - GPT-4,谷歌Gemini和Claude Opus -使用机构特定的通用模板从摘要生成完整的报告。两名放射科专家使用ACR RADPEER评分系统对放射科医生总结和法学硕士生成的报告进行定性评估,并以传统的放射科医生报告为参考。比较了传统报告和基于摘要的报告的报告时间,并分析了llm生成的报告的错误。计算定量相似性和语言度量,以评估与原始放射科医生生成的报告摘要的模型之间的报告一致性。使用Python 3.0进行统计分析,以确定报告时间,错误率和定量指标的显着差异。结果总结法的平均报告时间(6.76 min)显著短于常规法(8.95 min) (p <;0.005)。在100份放射科医生总结中,10份RADPEER评分低于1分,其中3份被认为具有临床显著差异。只有一份法学硕士生成的报告的RADPEER得分低于相应的摘要。llm生成报告的错误频率在不同模型之间没有显着差异,与模板相关的错误最常见(χ2 = 1.146, p = 0.564)。定量分析表明,三个法学硕士在相似性和语言指标上存在显著差异(p <;0.05),反映出独特的生成模式。结论基于摘要的扫描报告以及使用llm生成完整的个性化报告模板可以在保持报告质量的同时缩短报告时间。但是,仍然需要人工监督来处理生成的报告中的错误。相关声明基于摘要的放射学研究报告,以及使用大型语言模型使用通用模板生成定制报告,有可能通过缩短报告时间,同时保持报告质量,从而提高工作流程的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.70
自引率
3.00%
发文量
398
审稿时长
42 days
期刊介绍: European Journal of Radiology is an international journal which aims to communicate to its readers, state-of-the-art information on imaging developments in the form of high quality original research articles and timely reviews on current developments in the field. Its audience includes clinicians at all levels of training including radiology trainees, newly qualified imaging specialists and the experienced radiologist. Its aim is to inform efficient, appropriate and evidence-based imaging practice to the benefit of patients worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信