Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL
Michael S Yao, Allison Chae, Piya Saraiya, Charles E Kahn, Walter R Witschey, James C Gee, Hersh Sagreiya, Osbert Bastani
{"title":"Evaluating acute image ordering for real-world patient cases via language model alignment with radiological guidelines.","authors":"Michael S Yao, Allison Chae, Piya Saraiya, Charles E Kahn, Walter R Witschey, James C Gee, Hersh Sagreiya, Osbert Bastani","doi":"10.1038/s43856-025-01061-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Diagnostic imaging studies are increasingly important in the management of acutely presenting patients. However, ordering appropriate imaging studies in the emergency department is a challenging task with a high degree of variability among healthcare providers. To address this issue, recent work has investigated whether generative AI and large language models can be leveraged to recommend diagnostic imaging studies in accordance with evidence-based medical guidelines. However, it remains challenging to ensure that these tools can provide recommendations that correctly align with medical guidelines, especially given the limited diagnostic information available in acute care settings.</p><p><strong>Methods: </strong>In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that align with the American College of Radiology's Appropriateness Criteria, a set of evidence-based guidelines. To power our experiments, we introduce RadCases, a dataset of over 1500 annotated case summaries reflecting common patient presentations, and apply our framework to enable state-of-the-art language models to reason about appropriate imaging choices.</p><p><strong>Results: </strong>Using our framework, state-of-the-art language models achieve accuracy comparable to clinicians in ordering imaging studies. Furthermore, we demonstrate that our language model-based pipeline can be used as an intelligent assistant by clinicians to support image ordering workflows and improve the accuracy of acute image ordering according to the American College of Radiology's Appropriateness Criteria.</p><p><strong>Conclusions: </strong>Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision-making in alignment with expert evidence-based guidelines.</p>","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"332"},"PeriodicalIF":5.4000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12322208/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-01061-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Diagnostic imaging studies are increasingly important in the management of acutely presenting patients. However, ordering appropriate imaging studies in the emergency department is a challenging task with a high degree of variability among healthcare providers. To address this issue, recent work has investigated whether generative AI and large language models can be leveraged to recommend diagnostic imaging studies in accordance with evidence-based medical guidelines. However, it remains challenging to ensure that these tools can provide recommendations that correctly align with medical guidelines, especially given the limited diagnostic information available in acute care settings.

Methods: In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that align with the American College of Radiology's Appropriateness Criteria, a set of evidence-based guidelines. To power our experiments, we introduce RadCases, a dataset of over 1500 annotated case summaries reflecting common patient presentations, and apply our framework to enable state-of-the-art language models to reason about appropriate imaging choices.

Results: Using our framework, state-of-the-art language models achieve accuracy comparable to clinicians in ordering imaging studies. Furthermore, we demonstrate that our language model-based pipeline can be used as an intelligent assistant by clinicians to support image ordering workflows and improve the accuracy of acute image ordering according to the American College of Radiology's Appropriateness Criteria.

Conclusions: Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision-making in alignment with expert evidence-based guidelines.

通过与放射学指南的语言模型对齐来评估真实世界患者病例的急性图像排序。
背景:诊断性影像学研究在急性表现患者的治疗中越来越重要。然而,在急诊科进行适当的影像学检查是一项具有挑战性的任务,医疗保健提供者之间存在高度的可变性。为了解决这一问题,最近的工作调查了是否可以根据循证医学指南利用生成人工智能和大型语言模型来推荐诊断成像研究。然而,确保这些工具能够提供与医疗指南正确一致的建议仍然具有挑战性,特别是考虑到急性护理环境中可用的诊断信息有限。方法:在本研究中,我们引入了一个框架,通过推荐符合美国放射学会适当标准(一套循证指南)的患者病例的影像学研究,智能地利用语言模型。为了支持我们的实验,我们引入了RadCases,这是一个包含1500多个带注释的病例摘要的数据集,反映了常见的患者表现,并应用我们的框架使最先进的语言模型能够推理出适当的成像选择。结果:使用我们的框架,最先进的语言模型在排序成像研究中达到了与临床医生相当的准确性。此外,我们证明了我们基于语言模型的管道可以被临床医生用作智能助手,以支持图像排序工作流程,并根据美国放射学院的适当性标准提高急性图像排序的准确性。结论:我们的工作证明并验证了一种利用基于人工智能的软件来改善可信赖的临床决策的策略,与专家循证指南保持一致。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信