Serkan Günay, Ahmet Öztürk, Anılcan Tahsin Karahan, Mert Barindik, Seval Komut, Yavuz Yiğit
{"title":"比较DeepSeek和gpt - 40在心电图解读中的应用:人工智能是否会随着时间的推移而进步?","authors":"Serkan Günay, Ahmet Öztürk, Anılcan Tahsin Karahan, Mert Barindik, Seval Komut, Yavuz Yiğit","doi":"10.1016/j.hrtlng.2025.08.007","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek's performance in this domain remains unexplored.</p><p><strong>Objectives: </strong>This study aims to evaluate DeepSeek's accuracy in ECG interpretation and compare it with GPT-4o, emergency medicine specialists, and cardiologists. A secondary aim is to assess any performance changes in GPT-4o over one year.</p><p><strong>Methods: </strong>Between February 9 and March 1, 2025, 40 ECG images (20 daily routine, 20 more challenging) from the book 150 ECG Cases were evaluated by both GPT-4o and DeepSeek, each model tested 13 times. The accuracy of their responses was compared with previously collected answers from 12 cardiologists and 12 emergency medicine specialists. GPT-4o's 2025 performance was compared to its 2024 results on identical ECGs.</p><p><strong>Results: </strong>GPT-4o outperformed DeepSeek with higher median correct answers on daily routine (14 vs. 12), more challenging (13 vs. 10), and total ECGs (27 vs. 22) with statistically significant differences (p=0.048, p<0.001, p<0.001). A moderate agreement was observed between the responses provided by GPT-4o (p<0.001, Fleiss Kappa=0.473), while a substantial agreement was observed in the responses provided by DeepSeek (p<0.001, Fleiss Kappa=0.712). No significant year-over-year improvement was observed in GPT-4o's performance.</p><p><strong>Conclusion: </strong>This first evaluation of DeepSeek in ECG interpretation reveals its performance is lower than that of GPT-4o and human experts. While GPT-4o demonstrates greater accuracy, both models fall short of expert-level performance, underscoring the need for caution and further validation before clinical integration.</p>","PeriodicalId":55064,"journal":{"name":"Heart & Lung","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?\",\"authors\":\"Serkan Günay, Ahmet Öztürk, Anılcan Tahsin Karahan, Mert Barindik, Seval Komut, Yavuz Yiğit\",\"doi\":\"10.1016/j.hrtlng.2025.08.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek's performance in this domain remains unexplored.</p><p><strong>Objectives: </strong>This study aims to evaluate DeepSeek's accuracy in ECG interpretation and compare it with GPT-4o, emergency medicine specialists, and cardiologists. A secondary aim is to assess any performance changes in GPT-4o over one year.</p><p><strong>Methods: </strong>Between February 9 and March 1, 2025, 40 ECG images (20 daily routine, 20 more challenging) from the book 150 ECG Cases were evaluated by both GPT-4o and DeepSeek, each model tested 13 times. The accuracy of their responses was compared with previously collected answers from 12 cardiologists and 12 emergency medicine specialists. GPT-4o's 2025 performance was compared to its 2024 results on identical ECGs.</p><p><strong>Results: </strong>GPT-4o outperformed DeepSeek with higher median correct answers on daily routine (14 vs. 12), more challenging (13 vs. 10), and total ECGs (27 vs. 22) with statistically significant differences (p=0.048, p<0.001, p<0.001). A moderate agreement was observed between the responses provided by GPT-4o (p<0.001, Fleiss Kappa=0.473), while a substantial agreement was observed in the responses provided by DeepSeek (p<0.001, Fleiss Kappa=0.712). No significant year-over-year improvement was observed in GPT-4o's performance.</p><p><strong>Conclusion: </strong>This first evaluation of DeepSeek in ECG interpretation reveals its performance is lower than that of GPT-4o and human experts. While GPT-4o demonstrates greater accuracy, both models fall short of expert-level performance, underscoring the need for caution and further validation before clinical integration.</p>\",\"PeriodicalId\":55064,\"journal\":{\"name\":\"Heart & Lung\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Heart & Lung\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.hrtlng.2025.08.007\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Heart & Lung","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.hrtlng.2025.08.007","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?
Background: DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek's performance in this domain remains unexplored.
Objectives: This study aims to evaluate DeepSeek's accuracy in ECG interpretation and compare it with GPT-4o, emergency medicine specialists, and cardiologists. A secondary aim is to assess any performance changes in GPT-4o over one year.
Methods: Between February 9 and March 1, 2025, 40 ECG images (20 daily routine, 20 more challenging) from the book 150 ECG Cases were evaluated by both GPT-4o and DeepSeek, each model tested 13 times. The accuracy of their responses was compared with previously collected answers from 12 cardiologists and 12 emergency medicine specialists. GPT-4o's 2025 performance was compared to its 2024 results on identical ECGs.
Results: GPT-4o outperformed DeepSeek with higher median correct answers on daily routine (14 vs. 12), more challenging (13 vs. 10), and total ECGs (27 vs. 22) with statistically significant differences (p=0.048, p<0.001, p<0.001). A moderate agreement was observed between the responses provided by GPT-4o (p<0.001, Fleiss Kappa=0.473), while a substantial agreement was observed in the responses provided by DeepSeek (p<0.001, Fleiss Kappa=0.712). No significant year-over-year improvement was observed in GPT-4o's performance.
Conclusion: This first evaluation of DeepSeek in ECG interpretation reveals its performance is lower than that of GPT-4o and human experts. While GPT-4o demonstrates greater accuracy, both models fall short of expert-level performance, underscoring the need for caution and further validation before clinical integration.
期刊介绍:
Heart & Lung: The Journal of Cardiopulmonary and Acute Care, the official publication of The American Association of Heart Failure Nurses, presents original, peer-reviewed articles on techniques, advances, investigations, and observations related to the care of patients with acute and critical illness and patients with chronic cardiac or pulmonary disorders.
The Journal''s acute care articles focus on the care of hospitalized patients, including those in the critical and acute care settings. Because most patients who are hospitalized in acute and critical care settings have chronic conditions, we are also interested in the chronically critically ill, the care of patients with chronic cardiopulmonary disorders, their rehabilitation, and disease prevention. The Journal''s heart failure articles focus on all aspects of the care of patients with this condition. Manuscripts that are relevant to populations across the human lifespan are welcome.