比较DeepSeek和gpt - 40在心电图解读中的应用:人工智能是否会随着时间的推移而进步?

IF 2.6 4区 医学 Q2 CARDIAC & CARDIOVASCULAR SYSTEMS
Serkan Günay, Ahmet Öztürk, Anılcan Tahsin Karahan, Mert Barindik, Seval Komut, Yavuz Yiğit
{"title":"比较DeepSeek和gpt - 40在心电图解读中的应用:人工智能是否会随着时间的推移而进步?","authors":"Serkan Günay, Ahmet Öztürk, Anılcan Tahsin Karahan, Mert Barindik, Seval Komut, Yavuz Yiğit","doi":"10.1016/j.hrtlng.2025.08.007","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek's performance in this domain remains unexplored.</p><p><strong>Objectives: </strong>This study aims to evaluate DeepSeek's accuracy in ECG interpretation and compare it with GPT-4o, emergency medicine specialists, and cardiologists. A secondary aim is to assess any performance changes in GPT-4o over one year.</p><p><strong>Methods: </strong>Between February 9 and March 1, 2025, 40 ECG images (20 daily routine, 20 more challenging) from the book 150 ECG Cases were evaluated by both GPT-4o and DeepSeek, each model tested 13 times. The accuracy of their responses was compared with previously collected answers from 12 cardiologists and 12 emergency medicine specialists. GPT-4o's 2025 performance was compared to its 2024 results on identical ECGs.</p><p><strong>Results: </strong>GPT-4o outperformed DeepSeek with higher median correct answers on daily routine (14 vs. 12), more challenging (13 vs. 10), and total ECGs (27 vs. 22) with statistically significant differences (p=0.048, p<0.001, p<0.001). A moderate agreement was observed between the responses provided by GPT-4o (p<0.001, Fleiss Kappa=0.473), while a substantial agreement was observed in the responses provided by DeepSeek (p<0.001, Fleiss Kappa=0.712). No significant year-over-year improvement was observed in GPT-4o's performance.</p><p><strong>Conclusion: </strong>This first evaluation of DeepSeek in ECG interpretation reveals its performance is lower than that of GPT-4o and human experts. While GPT-4o demonstrates greater accuracy, both models fall short of expert-level performance, underscoring the need for caution and further validation before clinical integration.</p>","PeriodicalId":55064,"journal":{"name":"Heart & Lung","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?\",\"authors\":\"Serkan Günay, Ahmet Öztürk, Anılcan Tahsin Karahan, Mert Barindik, Seval Komut, Yavuz Yiğit\",\"doi\":\"10.1016/j.hrtlng.2025.08.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek's performance in this domain remains unexplored.</p><p><strong>Objectives: </strong>This study aims to evaluate DeepSeek's accuracy in ECG interpretation and compare it with GPT-4o, emergency medicine specialists, and cardiologists. A secondary aim is to assess any performance changes in GPT-4o over one year.</p><p><strong>Methods: </strong>Between February 9 and March 1, 2025, 40 ECG images (20 daily routine, 20 more challenging) from the book 150 ECG Cases were evaluated by both GPT-4o and DeepSeek, each model tested 13 times. The accuracy of their responses was compared with previously collected answers from 12 cardiologists and 12 emergency medicine specialists. GPT-4o's 2025 performance was compared to its 2024 results on identical ECGs.</p><p><strong>Results: </strong>GPT-4o outperformed DeepSeek with higher median correct answers on daily routine (14 vs. 12), more challenging (13 vs. 10), and total ECGs (27 vs. 22) with statistically significant differences (p=0.048, p<0.001, p<0.001). A moderate agreement was observed between the responses provided by GPT-4o (p<0.001, Fleiss Kappa=0.473), while a substantial agreement was observed in the responses provided by DeepSeek (p<0.001, Fleiss Kappa=0.712). No significant year-over-year improvement was observed in GPT-4o's performance.</p><p><strong>Conclusion: </strong>This first evaluation of DeepSeek in ECG interpretation reveals its performance is lower than that of GPT-4o and human experts. While GPT-4o demonstrates greater accuracy, both models fall short of expert-level performance, underscoring the need for caution and further validation before clinical integration.</p>\",\"PeriodicalId\":55064,\"journal\":{\"name\":\"Heart & Lung\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Heart & Lung\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.hrtlng.2025.08.007\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Heart & Lung","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.hrtlng.2025.08.007","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景:DeepSeek是最近推出的大型语言模型(LLM),而gpt - 40是一种先进的ChatGPT版本,其心电图(ECG)解释能力此前已被研究过。然而,DeepSeek在这一领域的表现仍未得到探索。目的:本研究旨在评估DeepSeek在心电图解释方面的准确性,并将其与gpt - 40、急诊医学专家和心脏病专家进行比较。第二个目标是评估gpt - 40在一年内的表现变化。方法:在2025年2月9日至3月1日期间,使用gpt - 40和DeepSeek对《150例心电图》中的40幅心电图图像(20幅为日常心电图,20幅为更具挑战性的心电图)进行评估,每个模型测试13次。他们回答的准确性与之前收集的12名心脏病专家和12名急诊医学专家的回答进行了比较。gpt - 40在2025年的表现与2024年在相同心电图上的结果进行了比较。结果:gpt - 40在日常答对中位数更高(14比12)、更具挑战性(13比10)和总心电图(27比22)方面优于DeepSeek,差异有统计学意义(p=0.048)。结论:对DeepSeek在心电图解读中的首次评估显示,其表现低于gpt - 40和人类专家。虽然gpt - 40显示出更高的准确性,但这两种模型都没有达到专家水平的性能,这强调了在临床整合之前需要谨慎和进一步验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?

Background: DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek's performance in this domain remains unexplored.

Objectives: This study aims to evaluate DeepSeek's accuracy in ECG interpretation and compare it with GPT-4o, emergency medicine specialists, and cardiologists. A secondary aim is to assess any performance changes in GPT-4o over one year.

Methods: Between February 9 and March 1, 2025, 40 ECG images (20 daily routine, 20 more challenging) from the book 150 ECG Cases were evaluated by both GPT-4o and DeepSeek, each model tested 13 times. The accuracy of their responses was compared with previously collected answers from 12 cardiologists and 12 emergency medicine specialists. GPT-4o's 2025 performance was compared to its 2024 results on identical ECGs.

Results: GPT-4o outperformed DeepSeek with higher median correct answers on daily routine (14 vs. 12), more challenging (13 vs. 10), and total ECGs (27 vs. 22) with statistically significant differences (p=0.048, p<0.001, p<0.001). A moderate agreement was observed between the responses provided by GPT-4o (p<0.001, Fleiss Kappa=0.473), while a substantial agreement was observed in the responses provided by DeepSeek (p<0.001, Fleiss Kappa=0.712). No significant year-over-year improvement was observed in GPT-4o's performance.

Conclusion: This first evaluation of DeepSeek in ECG interpretation reveals its performance is lower than that of GPT-4o and human experts. While GPT-4o demonstrates greater accuracy, both models fall short of expert-level performance, underscoring the need for caution and further validation before clinical integration.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Heart & Lung
Heart & Lung 医学-呼吸系统
CiteScore
4.60
自引率
3.60%
发文量
184
审稿时长
35 days
期刊介绍: Heart & Lung: The Journal of Cardiopulmonary and Acute Care, the official publication of The American Association of Heart Failure Nurses, presents original, peer-reviewed articles on techniques, advances, investigations, and observations related to the care of patients with acute and critical illness and patients with chronic cardiac or pulmonary disorders. The Journal''s acute care articles focus on the care of hospitalized patients, including those in the critical and acute care settings. Because most patients who are hospitalized in acute and critical care settings have chronic conditions, we are also interested in the chronically critically ill, the care of patients with chronic cardiopulmonary disorders, their rehabilitation, and disease prevention. The Journal''s heart failure articles focus on all aspects of the care of patients with this condition. Manuscripts that are relevant to populations across the human lifespan are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信