Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses.

IF 10.5 1区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Mitchell J Feldman, Edward P Hoffer, Jared J Conley, Jaime Chang, Jeanhee A Chung, Michael C Jernigan, William T Lester, Zachary H Strasser, Henry C Chueh
{"title":"Dedicated AI Expert System vs Generative AI With Large Language Model for Clinical Diagnoses.","authors":"Mitchell J Feldman, Edward P Hoffer, Jared J Conley, Jaime Chang, Jeanhee A Chung, Michael C Jernigan, William T Lester, Zachary H Strasser, Henry C Chueh","doi":"10.1001/jamanetworkopen.2025.12994","DOIUrl":null,"url":null,"abstract":"<p><strong>Importance: </strong>Large language models (LLMs) have not yet been compared with traditional diagnostic decision support systems (DDSSs) on unpublished clinical cases.</p><p><strong>Objective: </strong>To compare the performance of 2 widely used LLMs (ChatGPT, version 4 [hereafter, LLM1] and Gemini, version 1.5 [hereafter, LLM2]) with a DDSS (DXplain [hereafter, DDSS]) on 36 unpublished general medicine cases.</p><p><strong>Design, setting, and participants: </strong>This diagnostic study, conducted from October 6, 2023, to November 22, 2024, looked for the presence of the known case diagnosis in the differential diagnoses of the LLMs and DDSS after data from previously unpublished clinical cases from 3 academic medical centers were entered. The systems' performance was assessed both with and without laboratory test data. Each case was reviewed by 3 physicians blinded to the case diagnosis. Physicians identified all clinical findings as well as the subset deemed relevant to making the diagnosis for mapping to the DDSS's controlled vocabulary. Two other physicians, also blinded to the diagnoses, entered the data from these cases into the DDSS, LLM1, and LLM2.</p><p><strong>Exposures: </strong>All cases were entered into each LLM twice, with and without laboratory test results. For the DDSS, each case was entered 4 times: for all findings and for findings relevant to the diagnosis, each with and without laboratory test results. The top 25 diagnoses in each resulting differential diagnosis were reviewed.</p><p><strong>Main outcomes and measures: </strong>Presence or absence of the case diagnosis in the system's differential diagnosis and, when present, in which quintile it appeared in the top 25 diagnoses.</p><p><strong>Results: </strong>Among 36 patient cases of various races and ethnicities, genders, and ages (mean [SD] age, 51.4 [16.4] years), in the version with all findings but no laboratory test results, the DDSS listed the case diagnosis in its differential diagnosis more often (56% [20 of 36]) than LLM1 (42% [15 of 36]) and LLM2 (39% [14 of 36]), although this difference did not reach statistical significance (DDSS vs LLMI, P = .09; DDSS vs LLM2, P = .08). All 3 systems listed the case diagnosis in most cases if laboratory test results were included (all findings DDSS, 72% [26 of 36]; LLM1, 64% [23 of 36]; and LLM2, 58% [21 of 36]).</p><p><strong>Conclusions and relevance: </strong>In this diagnostic study comparing the performance of a traditional DDSS and current LLMs on unpublished clinical cases, in most cases, every system listed the case diagnosis in their top 25 diagnoses if laboratory test results were included. A hybrid approach that combines the parsing and expository linguistic capabilities of LLMs with the deterministic and explanatory capabilities of traditional DDSSs may produce synergistic benefits.</p>","PeriodicalId":14694,"journal":{"name":"JAMA Network Open","volume":"8 5","pages":"e2512994"},"PeriodicalIF":10.5000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123466/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMA Network Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1001/jamanetworkopen.2025.12994","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Importance: Large language models (LLMs) have not yet been compared with traditional diagnostic decision support systems (DDSSs) on unpublished clinical cases.

Objective: To compare the performance of 2 widely used LLMs (ChatGPT, version 4 [hereafter, LLM1] and Gemini, version 1.5 [hereafter, LLM2]) with a DDSS (DXplain [hereafter, DDSS]) on 36 unpublished general medicine cases.

Design, setting, and participants: This diagnostic study, conducted from October 6, 2023, to November 22, 2024, looked for the presence of the known case diagnosis in the differential diagnoses of the LLMs and DDSS after data from previously unpublished clinical cases from 3 academic medical centers were entered. The systems' performance was assessed both with and without laboratory test data. Each case was reviewed by 3 physicians blinded to the case diagnosis. Physicians identified all clinical findings as well as the subset deemed relevant to making the diagnosis for mapping to the DDSS's controlled vocabulary. Two other physicians, also blinded to the diagnoses, entered the data from these cases into the DDSS, LLM1, and LLM2.

Exposures: All cases were entered into each LLM twice, with and without laboratory test results. For the DDSS, each case was entered 4 times: for all findings and for findings relevant to the diagnosis, each with and without laboratory test results. The top 25 diagnoses in each resulting differential diagnosis were reviewed.

Main outcomes and measures: Presence or absence of the case diagnosis in the system's differential diagnosis and, when present, in which quintile it appeared in the top 25 diagnoses.

Results: Among 36 patient cases of various races and ethnicities, genders, and ages (mean [SD] age, 51.4 [16.4] years), in the version with all findings but no laboratory test results, the DDSS listed the case diagnosis in its differential diagnosis more often (56% [20 of 36]) than LLM1 (42% [15 of 36]) and LLM2 (39% [14 of 36]), although this difference did not reach statistical significance (DDSS vs LLMI, P = .09; DDSS vs LLM2, P = .08). All 3 systems listed the case diagnosis in most cases if laboratory test results were included (all findings DDSS, 72% [26 of 36]; LLM1, 64% [23 of 36]; and LLM2, 58% [21 of 36]).

Conclusions and relevance: In this diagnostic study comparing the performance of a traditional DDSS and current LLMs on unpublished clinical cases, in most cases, every system listed the case diagnosis in their top 25 diagnoses if laboratory test results were included. A hybrid approach that combines the parsing and expository linguistic capabilities of LLMs with the deterministic and explanatory capabilities of traditional DDSSs may produce synergistic benefits.

临床诊断专用人工智能专家系统vs大语言模型生成人工智能。
重要性:在未发表的临床病例中,大型语言模型(LLMs)尚未与传统的诊断决策支持系统(ddss)进行比较。目的:比较两种广泛使用的LLMs (ChatGPT version 4[以下简称LLM1]和Gemini version 1.5[以下简称LLM2])与DDSS (DXplain[以下简称DDSS])在36例未发表的全病病例中的表现。设计、环境和参与者:本诊断研究于2023年10月6日至2024年11月22日进行,在输入来自3个学术医疗中心的先前未发表的临床病例数据后,寻找llm和DDSS鉴别诊断中已知病例诊断的存在。在有和没有实验室测试数据的情况下,对系统的性能进行了评估。每个病例由3名不了解病例诊断的医生进行复查。医生确定了所有的临床表现,以及被认为与DDSS控制词汇表的诊断相关的子集。另外两名医生,同样对诊断不知情,将这些病例的数据输入DDSS、LLM1和LLM2。暴露:所有病例进入每个LLM两次,有和没有实验室检测结果。对于DDSS,每个病例输入4次:所有发现和与诊断相关的发现,每次都有和没有实验室检测结果。我们回顾了每个鉴别诊断中排名前25位的诊断。主要结果和措施:在系统的鉴别诊断中存在或不存在病例诊断,当存在时,它出现在前25个诊断中的哪五分之一。结果:在36例不同种族、性别、年龄(平均[SD]年龄51.4[16.4]岁)的患者中,在所有发现但没有实验室检查结果的版本中,DDSS在其鉴别诊断中列出病例诊断的频率(56%[20 / 36])高于LLM1(42%[15 / 36])和LLM2(39%[14 / 36]),尽管这种差异没有达到统计学意义(DDSS vs LLMI, P = .09;DDSS vs LLM2, P = .08)。如果包括实验室检测结果,所有3个系统都列出了大多数病例的病例诊断(所有发现DDSS, 72% [36 / 26];LLM1占64% [23 / 36];LLM2占58%[21 / 36])。结论和相关性:在本诊断研究中,比较了传统DDSS和当前LLMs在未发表临床病例上的表现,在大多数情况下,如果包括实验室检测结果,每个系统都将病例诊断列在前25个诊断中。将法学硕士的解析和说明性语言能力与传统ddss的确定性和解释性能力相结合的混合方法可能产生协同效益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JAMA Network Open
JAMA Network Open Medicine-General Medicine
CiteScore
16.00
自引率
2.90%
发文量
2126
审稿时长
16 weeks
期刊介绍: JAMA Network Open, a member of the esteemed JAMA Network, stands as an international, peer-reviewed, open-access general medical journal.The publication is dedicated to disseminating research across various health disciplines and countries, encompassing clinical care, innovation in health care, health policy, and global health. JAMA Network Open caters to clinicians, investigators, and policymakers, providing a platform for valuable insights and advancements in the medical field. As part of the JAMA Network, a consortium of peer-reviewed general medical and specialty publications, JAMA Network Open contributes to the collective knowledge and understanding within the medical community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信