大型语言模型在儿童放射学中检测需要立即报告的病例的准确性:一项使用公开可用的临床小片段的可行性研究。

IF 5.3 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Jun Sung Park, Jisun Hwang, Pyeong Hwa Kim, Woo Hyun Shim, Min Jeong Seo, Dahyun Kim, Jeong In Shin, In Hwa Kim, Hwon Heo, Chong Hyun Suh
{"title":"大型语言模型在儿童放射学中检测需要立即报告的病例的准确性:一项使用公开可用的临床小片段的可行性研究。","authors":"Jun Sung Park, Jisun Hwang, Pyeong Hwa Kim, Woo Hyun Shim, Min Jeong Seo, Dahyun Kim, Jeong In Shin, In Hwa Kim, Hwon Heo, Chong Hyun Suh","doi":"10.3348/kjr.2025.0240","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the accuracy of multimodal large language models (LLMs) in detecting cases requiring immediate radiology reporting in pediatric radiology.</p><p><strong>Materials and methods: </strong>Seventy-one publicly available, paraphrased pediatric clinical vignettes with images-sourced from the <i>New England Journal of Medicine</i>, <i>The Lancet</i>, <i>Archives of Pediatrics & Adolescent Medicine</i>, and <i>Radiology</i>-were assessed by seven vision-capable LLMs (temperature levels 0 and 1; t0 and t1) and four human readers (an expert pediatric radiologist, a trainee radiologist, an expert pediatrician, and a trainee pediatrician). Cases were classified as requiring immediate reporting (n = 33) if they corresponded to Korean Triage and Acuity Scale (KTAS) levels 1-2 (n = 24) or met the criteria for a critical value report (CVR) (n = 11). The most accurate LLM was compared with each human reader, with significance set at <i>P</i> < 0.013.</p><p><strong>Results: </strong>LLMs demonstrated 60.6%-83.1% accuracy in detecting cases requiring immediate radiology reporting: 57.7%-81.7% and 53.5%-87.3% for KTAS levels 1-2 and CVR cases, respectively. Gemini-Flash with t1 showed the highest accuracy among the LLMs: 83.1% (95% confidence interval [CI]: 74.6%-91.5%), 81.7% (95% CI: 71.8%-90.1%), and 87.3% (95% CI: 78.9%-94.4%) for identifying cases requiring immediate reporting, KTAS level 1-2 cases, and CVR cases, respectively, despite its low sensitivity for CVR detection (3/11, 27.3%). Human readers demonstrated 62.0%-84.5% accuracy for immediate radiology reporting, 73.2%-84.5% for KTAS levels 1-2, and 39.4%-94.4% for CVR cases. The accuracy of Gemini-Flash t1 in identifying cases requiring immediate radiology reporting was comparable to that of the most accurate human reader (vs. expert pediatrician: 84.5% [95% CI: 76.1%-93.0%]; <i>P</i> < 0.99).</p><p><strong>Conclusion: </strong>Multimodal LLMs may achieve overall accuracy comparable to or exceeding that of human readers in identifying cases requiring immediate radiology reporting, supporting their potential use for pediatric radiology worklist prioritization. However, the models' sensitivity in detecting such cases was not reliable.</p>","PeriodicalId":17881,"journal":{"name":"Korean Journal of Radiology","volume":"26 9","pages":"855-866"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12394824/pdf/","citationCount":"0","resultStr":"{\"title\":\"Accuracy of Large Language Models in Detecting Cases Requiring Immediate Reporting in Pediatric Radiology: A Feasibility Study Using Publicly Available Clinical Vignettes.\",\"authors\":\"Jun Sung Park, Jisun Hwang, Pyeong Hwa Kim, Woo Hyun Shim, Min Jeong Seo, Dahyun Kim, Jeong In Shin, In Hwa Kim, Hwon Heo, Chong Hyun Suh\",\"doi\":\"10.3348/kjr.2025.0240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To evaluate the accuracy of multimodal large language models (LLMs) in detecting cases requiring immediate radiology reporting in pediatric radiology.</p><p><strong>Materials and methods: </strong>Seventy-one publicly available, paraphrased pediatric clinical vignettes with images-sourced from the <i>New England Journal of Medicine</i>, <i>The Lancet</i>, <i>Archives of Pediatrics & Adolescent Medicine</i>, and <i>Radiology</i>-were assessed by seven vision-capable LLMs (temperature levels 0 and 1; t0 and t1) and four human readers (an expert pediatric radiologist, a trainee radiologist, an expert pediatrician, and a trainee pediatrician). Cases were classified as requiring immediate reporting (n = 33) if they corresponded to Korean Triage and Acuity Scale (KTAS) levels 1-2 (n = 24) or met the criteria for a critical value report (CVR) (n = 11). The most accurate LLM was compared with each human reader, with significance set at <i>P</i> < 0.013.</p><p><strong>Results: </strong>LLMs demonstrated 60.6%-83.1% accuracy in detecting cases requiring immediate radiology reporting: 57.7%-81.7% and 53.5%-87.3% for KTAS levels 1-2 and CVR cases, respectively. Gemini-Flash with t1 showed the highest accuracy among the LLMs: 83.1% (95% confidence interval [CI]: 74.6%-91.5%), 81.7% (95% CI: 71.8%-90.1%), and 87.3% (95% CI: 78.9%-94.4%) for identifying cases requiring immediate reporting, KTAS level 1-2 cases, and CVR cases, respectively, despite its low sensitivity for CVR detection (3/11, 27.3%). Human readers demonstrated 62.0%-84.5% accuracy for immediate radiology reporting, 73.2%-84.5% for KTAS levels 1-2, and 39.4%-94.4% for CVR cases. The accuracy of Gemini-Flash t1 in identifying cases requiring immediate radiology reporting was comparable to that of the most accurate human reader (vs. expert pediatrician: 84.5% [95% CI: 76.1%-93.0%]; <i>P</i> < 0.99).</p><p><strong>Conclusion: </strong>Multimodal LLMs may achieve overall accuracy comparable to or exceeding that of human readers in identifying cases requiring immediate radiology reporting, supporting their potential use for pediatric radiology worklist prioritization. However, the models' sensitivity in detecting such cases was not reliable.</p>\",\"PeriodicalId\":17881,\"journal\":{\"name\":\"Korean Journal of Radiology\",\"volume\":\"26 9\",\"pages\":\"855-866\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12394824/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Korean Journal of Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3348/kjr.2025.0240\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3348/kjr.2025.0240","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

目的:评价多模态大语言模型(LLMs)在儿科放射学中检测需要立即报告的病例的准确性。材料和方法:71个公开的、转述的儿科临床图片——来自《新英格兰医学杂志》、《柳叶刀》、《儿科与青少年医学档案》和《放射学》——由7位具有视觉能力的法学硕士(温度水平0和1;温度水平0和t1)和4位人类读者(一位儿科放射科专家、一位实习放射科医生、一位儿科专家和一位实习儿科医生)进行评估。如果病例符合韩国分类和急性程度量表(KTAS) 1-2级(n = 24)或符合临界值报告(CVR)标准(n = 11),则将其分类为需要立即报告(n = 33)。最准确的LLM与每个人类读者进行比较,显著性设置为P < 0.013。结果:LLMs对需要立即报告放射学的病例的检测准确率为60.6%-83.1%,对KTAS 1-2级和CVR病例的检测准确率分别为57.7%-81.7%和53.5%-87.3%。具有t1的Gemini-Flash在llm中准确率最高,分别为83.1%(95%置信区间[CI]: 74.6%-91.5%), 81.7% (95% CI: 71.8%-90.1%)和87.3% (95% CI: 78.9%-94.4%),用于识别需要立即报告的病例,KTAS 1-2级病例和CVR病例,尽管其对CVR检测的敏感性较低(3/ 11,27.3%)。人类读者对即时放射报告的准确率为62.0%-84.5%,对KTAS 1-2级的准确率为73.2%-84.5%,对CVR病例的准确率为39.4%-94.4%。Gemini-Flash t1在识别需要立即放射学报告的病例时的准确性与最准确的人类读者相当(与儿科专家相比:84.5% [95% CI: 76.1%-93.0%]; P < 0.99)。结论:在识别需要立即放射学报告的病例时,多模式llm可以达到与人类读者相当或超过人类读者的总体准确性,这支持了它们在儿科放射学工作清单优先排序中的潜在用途。然而,该模型在检测此类情况时的灵敏度并不可靠。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Accuracy of Large Language Models in Detecting Cases Requiring Immediate Reporting in Pediatric Radiology: A Feasibility Study Using Publicly Available Clinical Vignettes.

Accuracy of Large Language Models in Detecting Cases Requiring Immediate Reporting in Pediatric Radiology: A Feasibility Study Using Publicly Available Clinical Vignettes.

Accuracy of Large Language Models in Detecting Cases Requiring Immediate Reporting in Pediatric Radiology: A Feasibility Study Using Publicly Available Clinical Vignettes.

Accuracy of Large Language Models in Detecting Cases Requiring Immediate Reporting in Pediatric Radiology: A Feasibility Study Using Publicly Available Clinical Vignettes.

Objective: To evaluate the accuracy of multimodal large language models (LLMs) in detecting cases requiring immediate radiology reporting in pediatric radiology.

Materials and methods: Seventy-one publicly available, paraphrased pediatric clinical vignettes with images-sourced from the New England Journal of Medicine, The Lancet, Archives of Pediatrics & Adolescent Medicine, and Radiology-were assessed by seven vision-capable LLMs (temperature levels 0 and 1; t0 and t1) and four human readers (an expert pediatric radiologist, a trainee radiologist, an expert pediatrician, and a trainee pediatrician). Cases were classified as requiring immediate reporting (n = 33) if they corresponded to Korean Triage and Acuity Scale (KTAS) levels 1-2 (n = 24) or met the criteria for a critical value report (CVR) (n = 11). The most accurate LLM was compared with each human reader, with significance set at P < 0.013.

Results: LLMs demonstrated 60.6%-83.1% accuracy in detecting cases requiring immediate radiology reporting: 57.7%-81.7% and 53.5%-87.3% for KTAS levels 1-2 and CVR cases, respectively. Gemini-Flash with t1 showed the highest accuracy among the LLMs: 83.1% (95% confidence interval [CI]: 74.6%-91.5%), 81.7% (95% CI: 71.8%-90.1%), and 87.3% (95% CI: 78.9%-94.4%) for identifying cases requiring immediate reporting, KTAS level 1-2 cases, and CVR cases, respectively, despite its low sensitivity for CVR detection (3/11, 27.3%). Human readers demonstrated 62.0%-84.5% accuracy for immediate radiology reporting, 73.2%-84.5% for KTAS levels 1-2, and 39.4%-94.4% for CVR cases. The accuracy of Gemini-Flash t1 in identifying cases requiring immediate radiology reporting was comparable to that of the most accurate human reader (vs. expert pediatrician: 84.5% [95% CI: 76.1%-93.0%]; P < 0.99).

Conclusion: Multimodal LLMs may achieve overall accuracy comparable to or exceeding that of human readers in identifying cases requiring immediate radiology reporting, supporting their potential use for pediatric radiology worklist prioritization. However, the models' sensitivity in detecting such cases was not reliable.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Korean Journal of Radiology
Korean Journal of Radiology 医学-核医学
CiteScore
10.60
自引率
12.50%
发文量
141
审稿时长
1.3 months
期刊介绍: The inaugural issue of the Korean J Radiol came out in March 2000. Our journal aims to produce and propagate knowledge on radiologic imaging and related sciences. A unique feature of the articles published in the Journal will be their reflection of global trends in radiology combined with an East-Asian perspective. Geographic differences in disease prevalence will be reflected in the contents of papers, and this will serve to enrich our body of knowledge. World''s outstanding radiologists from many countries are serving as editorial board of our journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信