An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.

James Thomas Patrick Decourcy Hallinan,Naomi Wenxin Leow,Yi Xian Low,Aric Lee,Wilson Ong,Matthew Ding Zhou Chan,Ganakirthana Kalpenya Devi,Stephanie Shengjie He,Daniel De-Liang Loh,Desmond Shi Wei Lim,Xi Zhen Low,Ee Chin Teo,Shaheryar Mohammad Furqan,Wilson Wei Yang Tham,Jiong Hao Tan,Naresh Kumar,Andrew Makmur,Ting Yonghan
{"title":"An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.","authors":"James Thomas Patrick Decourcy Hallinan,Naomi Wenxin Leow,Yi Xian Low,Aric Lee,Wilson Ong,Matthew Ding Zhou Chan,Ganakirthana Kalpenya Devi,Stephanie Shengjie He,Daniel De-Liang Loh,Desmond Shi Wei Lim,Xi Zhen Low,Ee Chin Teo,Shaheryar Mohammad Furqan,Wilson Wei Yang Tham,Jiong Hao Tan,Naresh Kumar,Andrew Makmur,Ting Yonghan","doi":"10.2106/jbjs.24.01429","DOIUrl":null,"url":null,"abstract":"BACKGROUND\r\nPrivacy-preserving large language models (PP-LLMs) hold potential for assisting clinicians with documentation. We evaluated a PP-LLM to improve the clinical information on radiology request forms for musculoskeletal magnetic resonance imaging (MRI) and to automate protocoling, which ensures that the most appropriate imaging is performed.\r\n\r\nMETHODS\r\nThe present retrospective study included musculoskeletal MRI radiology request forms that had been randomly collected from June to December 2023. Studies without electronic medical record (EMR) entries were excluded. An institutional PP-LLM (Claude Sonnet 3.5) augmented the original radiology request forms by mining EMRs, and, in combination with rule-based processing of the LLM outputs, suggested appropriate protocols using institutional guidelines. Clinical information on the original and PP-LLM radiology request forms were compared with use of the RI-RADS (Reason for exam Imaging Reporting and Data System) grading by 2 musculoskeletal (MSK) radiologists independently (MSK1, with 13 years of experience, and MSK2, with 11 years of experience). These radiologists established a consensus reference standard for protocoling, against which the PP-LLM and of 2 second-year board-certified radiologists (RAD1 and RAD2) were compared. Inter-rater reliability was assessed with use of the Gwet AC1, and the percentage agreement with the reference standard was calculated.\r\n\r\nRESULTS\r\nOverall, 500 musculoskeletal MRI radiology request forms were analyzed for 407 patients (202 women and 205 men with a mean age [and standard deviation] of 50.3 ± 19.5 years) across a range of anatomical regions, including the spine/pelvis (143 MRI scans; 28.6%), upper extremity (169 scans; 33.8%) and lower extremity (188 scans; 37.6%). Two hundred and twenty-two (44.4%) of the 500 MRI scans required contrast. The clinical information provided in the PP-LLM-augmented radiology request forms was rated as superior to that in the original requests. Only 0.4% to 0.6% of PP-LLM radiology request forms were rated as limited/deficient, compared with 12.4% to 22.6% of the original requests (p < 0.001). Almost-perfect inter-rater reliability was observed for LLM-enhanced requests (AC1 = 0.99; 95% confidence interval [CI], 0.99 to 1.0), compared with substantial agreement for the original forms (AC1 = 0.62; 95% CI, 0.56 to 0.67). For protocoling, MSK1 and MSK2 showed almost-perfect agreement on the region/coverage (AC1 = 0.96; 95% CI, 0.95 to 0.98) and contrast requirement (AC1 = 0.98; 95% CI, 0.97 to 0.99). Compared with the consensus reference standard, protocoling accuracy for the PP-LLM was 95.8% (95% CI, 94.0% to 97.6%), which was significantly higher than that for both RAD1 (88.6%; 95% CI, 85.8% to 91.4%) and RAD2 (88.2%; 95% CI, 85.4% to 91.0%) (p < 0.001 for both).\r\n\r\nCONCLUSIONS\r\nMusculoskeletal MRI request form augmentation with an institutional LLM provided superior clinical information and improved protocoling accuracy compared with clinician requests and non-MSK-trained radiologists. Institutional adoption of such LLMs could enhance the appropriateness of MRI utilization and patient care.\r\n\r\nLEVEL OF EVIDENCE\r\nDiagnostic Level III. See Instructions for Authors for a complete description of levels of evidence.","PeriodicalId":22625,"journal":{"name":"The Journal of Bone & Joint Surgery","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Bone & Joint Surgery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2106/jbjs.24.01429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

BACKGROUND Privacy-preserving large language models (PP-LLMs) hold potential for assisting clinicians with documentation. We evaluated a PP-LLM to improve the clinical information on radiology request forms for musculoskeletal magnetic resonance imaging (MRI) and to automate protocoling, which ensures that the most appropriate imaging is performed. METHODS The present retrospective study included musculoskeletal MRI radiology request forms that had been randomly collected from June to December 2023. Studies without electronic medical record (EMR) entries were excluded. An institutional PP-LLM (Claude Sonnet 3.5) augmented the original radiology request forms by mining EMRs, and, in combination with rule-based processing of the LLM outputs, suggested appropriate protocols using institutional guidelines. Clinical information on the original and PP-LLM radiology request forms were compared with use of the RI-RADS (Reason for exam Imaging Reporting and Data System) grading by 2 musculoskeletal (MSK) radiologists independently (MSK1, with 13 years of experience, and MSK2, with 11 years of experience). These radiologists established a consensus reference standard for protocoling, against which the PP-LLM and of 2 second-year board-certified radiologists (RAD1 and RAD2) were compared. Inter-rater reliability was assessed with use of the Gwet AC1, and the percentage agreement with the reference standard was calculated. RESULTS Overall, 500 musculoskeletal MRI radiology request forms were analyzed for 407 patients (202 women and 205 men with a mean age [and standard deviation] of 50.3 ± 19.5 years) across a range of anatomical regions, including the spine/pelvis (143 MRI scans; 28.6%), upper extremity (169 scans; 33.8%) and lower extremity (188 scans; 37.6%). Two hundred and twenty-two (44.4%) of the 500 MRI scans required contrast. The clinical information provided in the PP-LLM-augmented radiology request forms was rated as superior to that in the original requests. Only 0.4% to 0.6% of PP-LLM radiology request forms were rated as limited/deficient, compared with 12.4% to 22.6% of the original requests (p < 0.001). Almost-perfect inter-rater reliability was observed for LLM-enhanced requests (AC1 = 0.99; 95% confidence interval [CI], 0.99 to 1.0), compared with substantial agreement for the original forms (AC1 = 0.62; 95% CI, 0.56 to 0.67). For protocoling, MSK1 and MSK2 showed almost-perfect agreement on the region/coverage (AC1 = 0.96; 95% CI, 0.95 to 0.98) and contrast requirement (AC1 = 0.98; 95% CI, 0.97 to 0.99). Compared with the consensus reference standard, protocoling accuracy for the PP-LLM was 95.8% (95% CI, 94.0% to 97.6%), which was significantly higher than that for both RAD1 (88.6%; 95% CI, 85.8% to 91.4%) and RAD2 (88.2%; 95% CI, 85.4% to 91.0%) (p < 0.001 for both). CONCLUSIONS Musculoskeletal MRI request form augmentation with an institutional LLM provided superior clinical information and improved protocoling accuracy compared with clinician requests and non-MSK-trained radiologists. Institutional adoption of such LLMs could enhance the appropriateness of MRI utilization and patient care. LEVEL OF EVIDENCE Diagnostic Level III. See Instructions for Authors for a complete description of levels of evidence.
一种用于肌肉骨骼MRI的机构大语言模型提高了协议的依从性和准确性。
背景:保护隐私的大型语言模型(pp - llm)具有帮助临床医生编制文档的潜力。我们评估了PP-LLM,以改善肌肉骨骼磁共振成像(MRI)放射学申请表的临床信息,并实现自动化处理,确保进行最合适的成像。方法回顾性研究随机收集于2023年6月至12月期间的肌肉骨骼MRI放射学申请表。没有电子病历(EMR)条目的研究被排除在外。机构PP-LLM (Claude Sonnet 3.5)通过挖掘电子病历增强了原始放射学请求表格,并结合基于规则的LLM输出处理,根据机构指南提出了适当的协议。由2名肌肉骨骼(MSK)放射科医师(MSK1,具有13年经验,MSK2,具有11年经验)独立使用RI-RADS(检查成像报告和数据系统)评分,比较原始和PP-LLM放射学申请表的临床信息。这些放射科医生为治疗方案建立了共识参考标准,并将PP-LLM和2名二年级委员会认证放射科医生(RAD1和RAD2)进行比较。使用Gwet AC1评估评分者间信度,并计算与参考标准的一致性百分比。结果总体而言,我们分析了407名患者(202名女性和205名男性,平均年龄[和标准差]为50.3±19.5岁)的500份肌肉骨骼MRI放射学申请表,涵盖了一系列解剖区域,包括脊柱/骨盆(143次MRI扫描;28.6%),上肢(169次扫描;33.8%)和下肢(188次扫描;37.6%)。500例MRI扫描中有222例(44.4%)需要进行对比。在pp - llm增强放射学申请表中提供的临床信息被评为优于原始请求。只有0.4%至0.6%的PP-LLM放射学申请表被评为有限/缺陷,而原始申请的比例为12.4%至22.6% (p < 0.001)。llm增强请求的评分者间信度几乎完美(AC1 = 0.99;95%置信区间[CI], 0.99至1.0),与原始表格的基本一致(AC1 = 0.62;95% CI, 0.56 ~ 0.67)。对于协议,MSK1和MSK2在区域/覆盖上几乎完全一致(AC1 = 0.96;95% CI, 0.95 ~ 0.98)和对比度要求(AC1 = 0.98;95% CI, 0.97 ~ 0.99)。与共识参考标准相比,PP-LLM的协议准确度为95.8% (95% CI, 94.0%至97.6%),显著高于RAD1 (88.6%;95% CI, 85.8%至91.4%)和RAD2 (88.2%;95% CI, 85.4% ~ 91.0%)(两者p < 0.001)。结论与临床医生和非msk培训的放射科医生相比,机构LLM的肌肉骨骼MRI请求表单增强提供了更好的临床信息和更高的协议准确性。机构采用此类LLMs可提高MRI应用的适当性和患者护理。证据等级:诊断性三级。有关证据水平的完整描述,请参见作者说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信