心脏病学中的大型语言模型:系统综述。

IF 2.2 Q2 Medicine
JMIR Cardio Pub Date : 2026-04-16 DOI:10.2196/76734
Moran Gendler, Girish N Nadkarni, Karin Sudri, Michal Cohen-Shelly, Benjamin S Glicksberg, Orly Efros, Shelly Soffer, Eyal Klang
{"title":"心脏病学中的大型语言模型:系统综述。","authors":"Moran Gendler, Girish N Nadkarni, Karin Sudri, Michal Cohen-Shelly, Benjamin S Glicksberg, Orly Efros, Shelly Soffer, Eyal Klang","doi":"10.2196/76734","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) are increasingly used in health care, but their role in cardiology has not yet been systematically evaluated.</p><p><strong>Objective: </strong>This review aimed to assess the applications, performance, and limitations of LLMs across diverse cardiology tasks, including chronic and progressive conditions, acute events, education, and diagnostic testing.</p><p><strong>Methods: </strong>A systematic search was conducted in PubMed and Scopus for studies published up to April 14, 2024, using keywords related to LLMs and cardiology. Studies evaluating LLM outputs in cardiology-related tasks were included. Data were extracted across 5 predefined domains and the risk of bias was assessed using an adapted QUADAS-2 tool (developed by Whiting et al at the University of Bristol). The review protocol was registered in PROSPERO (CRD42024556397).</p><p><strong>Results: </strong>A total of 33 studies contributed quantitative outcome data to a descriptive synthesis. Across chronic conditions, ChatGPT-3.5 (OpenAI) answered 91% (43/47) heart failure questions accurately, although readability often required college-level comprehension. In acute scenarios, Bing Chat omitted key myocardial infarction first aid steps in 25% (5/20) to 45% (9/20) of cases, while cardiac arrest information was rated highly (mean 4.3/5, SD 0.7) but written above recommended reading levels. In physician education tasks, ChatGPT-4 (OpenAI) demonstrated higher accuracy than ChatGPT-3.5, improving from 38% (33/88) to 66% (58/88). In patient education studies, ChatGPT-4 provided scientifically adequate explanations (5.0-6.0/7) comparable to hospital materials but at higher reading levels (11th vs 7th grade). In diagnostic testing, ChatGPT-4 interpreted 91% (36/40) electrocardiogram vignettes correctly, significantly better than emergency physicians (31/40, 77%; P< .001), but showed lower performance in echocardiography.</p><p><strong>Conclusions: </strong>LLMs show meaningful potential in cardiology, especially for education and electrocardiogram interpretation, but performance varies across clinical tasks. Limitations in emergency guidance and readability, as well as small in silico study designs, highlight the need for multimodal models and prospective validation.</p>","PeriodicalId":14706,"journal":{"name":"JMIR Cardio","volume":"10 ","pages":"e76734"},"PeriodicalIF":2.2000,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13085985/pdf/","citationCount":"0","resultStr":"{\"title\":\"Large Language Models in Cardiology: Systematic Review.\",\"authors\":\"Moran Gendler, Girish N Nadkarni, Karin Sudri, Michal Cohen-Shelly, Benjamin S Glicksberg, Orly Efros, Shelly Soffer, Eyal Klang\",\"doi\":\"10.2196/76734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large language models (LLMs) are increasingly used in health care, but their role in cardiology has not yet been systematically evaluated.</p><p><strong>Objective: </strong>This review aimed to assess the applications, performance, and limitations of LLMs across diverse cardiology tasks, including chronic and progressive conditions, acute events, education, and diagnostic testing.</p><p><strong>Methods: </strong>A systematic search was conducted in PubMed and Scopus for studies published up to April 14, 2024, using keywords related to LLMs and cardiology. Studies evaluating LLM outputs in cardiology-related tasks were included. Data were extracted across 5 predefined domains and the risk of bias was assessed using an adapted QUADAS-2 tool (developed by Whiting et al at the University of Bristol). The review protocol was registered in PROSPERO (CRD42024556397).</p><p><strong>Results: </strong>A total of 33 studies contributed quantitative outcome data to a descriptive synthesis. Across chronic conditions, ChatGPT-3.5 (OpenAI) answered 91% (43/47) heart failure questions accurately, although readability often required college-level comprehension. In acute scenarios, Bing Chat omitted key myocardial infarction first aid steps in 25% (5/20) to 45% (9/20) of cases, while cardiac arrest information was rated highly (mean 4.3/5, SD 0.7) but written above recommended reading levels. In physician education tasks, ChatGPT-4 (OpenAI) demonstrated higher accuracy than ChatGPT-3.5, improving from 38% (33/88) to 66% (58/88). In patient education studies, ChatGPT-4 provided scientifically adequate explanations (5.0-6.0/7) comparable to hospital materials but at higher reading levels (11th vs 7th grade). In diagnostic testing, ChatGPT-4 interpreted 91% (36/40) electrocardiogram vignettes correctly, significantly better than emergency physicians (31/40, 77%; P< .001), but showed lower performance in echocardiography.</p><p><strong>Conclusions: </strong>LLMs show meaningful potential in cardiology, especially for education and electrocardiogram interpretation, but performance varies across clinical tasks. Limitations in emergency guidance and readability, as well as small in silico study designs, highlight the need for multimodal models and prospective validation.</p>\",\"PeriodicalId\":14706,\"journal\":{\"name\":\"JMIR Cardio\",\"volume\":\"10 \",\"pages\":\"e76734\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2026-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13085985/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Cardio\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/76734\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Cardio","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/76734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

背景:大型语言模型(llm)越来越多地用于医疗保健,但其在心脏病学中的作用尚未得到系统评估。目的:本综述旨在评估llm在不同心脏病学任务中的应用、性能和局限性,包括慢性和进展性疾病、急性事件、教育和诊断测试。方法:系统检索PubMed和Scopus中截止2024年4月14日发表的相关研究,检索关键词为法学硕士和心脏病学。包括评估法学硕士在心脏病学相关任务中的产出的研究。从5个预定义的领域中提取数据,并使用改进的QUADAS-2工具(由布里斯托尔大学的Whiting等人开发)评估偏倚风险。该审查方案已在PROSPERO注册(CRD42024556397)。结果:共有33项研究为描述性综合提供了定量结果数据。在慢性疾病中,ChatGPT-3.5 (OpenAI)准确回答了91%(43/47)的心力衰竭问题,尽管其可读性通常需要大学水平的理解能力。在急性情况下,Bing Chat在25%(5/20)至45%(9/20)的病例中省略了关键的心肌梗死急救步骤,而心脏骤停信息的评分很高(平均4.3/5,SD 0.7),但高于推荐阅读水平。在医生教育任务中,ChatGPT-4 (OpenAI)的准确率高于ChatGPT-3.5,从38%(33/88)提高到66%(58/88)。在患者教育研究中,ChatGPT-4提供了与医院材料相当的科学充分的解释(5.0-6.0/7),但阅读水平更高(11年级与7年级)。在诊断测试中,ChatGPT-4对91%(36/40)心电图图像的正确解释显著优于急诊医生(31/40,77%;P< 0.001),但在超声心动图中表现较差。结论:llm在心脏病学方面表现出有意义的潜力,特别是在教育和心电图解释方面,但在临床任务中的表现各不相同。紧急指导和可读性的局限性,以及小型的计算机研究设计,突出了对多模式模型和前瞻性验证的需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Large Language Models in Cardiology: Systematic Review.

Background: Large language models (LLMs) are increasingly used in health care, but their role in cardiology has not yet been systematically evaluated.

Objective: This review aimed to assess the applications, performance, and limitations of LLMs across diverse cardiology tasks, including chronic and progressive conditions, acute events, education, and diagnostic testing.

Methods: A systematic search was conducted in PubMed and Scopus for studies published up to April 14, 2024, using keywords related to LLMs and cardiology. Studies evaluating LLM outputs in cardiology-related tasks were included. Data were extracted across 5 predefined domains and the risk of bias was assessed using an adapted QUADAS-2 tool (developed by Whiting et al at the University of Bristol). The review protocol was registered in PROSPERO (CRD42024556397).

Results: A total of 33 studies contributed quantitative outcome data to a descriptive synthesis. Across chronic conditions, ChatGPT-3.5 (OpenAI) answered 91% (43/47) heart failure questions accurately, although readability often required college-level comprehension. In acute scenarios, Bing Chat omitted key myocardial infarction first aid steps in 25% (5/20) to 45% (9/20) of cases, while cardiac arrest information was rated highly (mean 4.3/5, SD 0.7) but written above recommended reading levels. In physician education tasks, ChatGPT-4 (OpenAI) demonstrated higher accuracy than ChatGPT-3.5, improving from 38% (33/88) to 66% (58/88). In patient education studies, ChatGPT-4 provided scientifically adequate explanations (5.0-6.0/7) comparable to hospital materials but at higher reading levels (11th vs 7th grade). In diagnostic testing, ChatGPT-4 interpreted 91% (36/40) electrocardiogram vignettes correctly, significantly better than emergency physicians (31/40, 77%; P< .001), but showed lower performance in echocardiography.

Conclusions: LLMs show meaningful potential in cardiology, especially for education and electrocardiogram interpretation, but performance varies across clinical tasks. Limitations in emergency guidance and readability, as well as small in silico study designs, highlight the need for multimodal models and prospective validation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Cardio
JMIR Cardio Computer Science-Computer Science Applications
CiteScore
3.50
自引率
0.00%
发文量
25
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书