ChatGPT与DeepSeek大语言模型在心包炎诊断中的比较。

IF 2.8 Q3 CARDIAC & CARDIOVASCULAR SYSTEMS
Aman Goyal, Samia Aziz Sulaiman, Abdallah Alaarag, Waseem Hoshan, Priya Goyal, Viraj Shah, Mohamed Daoud, Gauranga Mahalwar, Abu Baker Sheikh
{"title":"ChatGPT与DeepSeek大语言模型在心包炎诊断中的比较。","authors":"Aman Goyal, Samia Aziz Sulaiman, Abdallah Alaarag, Waseem Hoshan, Priya Goyal, Viraj Shah, Mohamed Daoud, Gauranga Mahalwar, Abu Baker Sheikh","doi":"10.4330/wjc.v17.i8.110489","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The integration of sophisticated large language models (LLMs) into healthcare has recently garnered significant attention due to their ability to leverage deep learning techniques to process vast datasets and generate contextually accurate, human-like responses. These models have been previously applied in medical diagnostics, such as in the evaluation of oral lesions. Given the high rate of missed diagnoses in pericarditis, LLMs may support clinicians in generating differential diagnoses-particularly in atypical cases where risk stratification and early identification are critical to preventing serious complications such as constrictive pericarditis and pericardial tamponade.</p><p><strong>Aim: </strong>To compare the accuracy of LLMs in assisting the diagnosis of pericarditis as risk stratification tools.</p><p><strong>Methods: </strong>A PubMed search was conducted using the keyword \"pericarditis\", applying filters for \"case reports\". Data from relevant cases were extracted. Inclusion criteria consisted of English-language reports involving patients aged 18 years or older with a confirmed diagnosis of acute pericarditis. The diagnostic capabilities of ChatGPT o1 and DeepThink-R1 were assessed by evaluating whether pericarditis was included in the top three differential diagnoses and as the sole provisional diagnosis. Each case was classified as either \"yes\" or \"no\" for inclusion.</p><p><strong>Results: </strong>From the initial search, 220 studies were identified, of which 16 case reports met the inclusion criteria. In assessing risk stratification for acute pericarditis, ChatGPT o1 correctly identified the condition in 10 of 16 cases (62.5%) in the differential diagnosis and in 8 of 16 cases (50.0%) as the provisional diagnosis. DeepThink-R1 identified it in 8 of 16 cases (50.0%) and 6 of 16 cases (37.5%), respectively. ChatGPT o1 demonstrated higher accuracy than DeepThink-R1 in identifying pericarditis.</p><p><strong>Conclusion: </strong>Further research with larger sample sizes and optimized prompt engineering is warranted to improve diagnostic accuracy, particularly in atypical presentations.</p>","PeriodicalId":23800,"journal":{"name":"World Journal of Cardiology","volume":"17 8","pages":"110489"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12426987/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparison of ChatGPT and DeepSeek large language models in the diagnosis of pericarditis.\",\"authors\":\"Aman Goyal, Samia Aziz Sulaiman, Abdallah Alaarag, Waseem Hoshan, Priya Goyal, Viraj Shah, Mohamed Daoud, Gauranga Mahalwar, Abu Baker Sheikh\",\"doi\":\"10.4330/wjc.v17.i8.110489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The integration of sophisticated large language models (LLMs) into healthcare has recently garnered significant attention due to their ability to leverage deep learning techniques to process vast datasets and generate contextually accurate, human-like responses. These models have been previously applied in medical diagnostics, such as in the evaluation of oral lesions. Given the high rate of missed diagnoses in pericarditis, LLMs may support clinicians in generating differential diagnoses-particularly in atypical cases where risk stratification and early identification are critical to preventing serious complications such as constrictive pericarditis and pericardial tamponade.</p><p><strong>Aim: </strong>To compare the accuracy of LLMs in assisting the diagnosis of pericarditis as risk stratification tools.</p><p><strong>Methods: </strong>A PubMed search was conducted using the keyword \\\"pericarditis\\\", applying filters for \\\"case reports\\\". Data from relevant cases were extracted. Inclusion criteria consisted of English-language reports involving patients aged 18 years or older with a confirmed diagnosis of acute pericarditis. The diagnostic capabilities of ChatGPT o1 and DeepThink-R1 were assessed by evaluating whether pericarditis was included in the top three differential diagnoses and as the sole provisional diagnosis. Each case was classified as either \\\"yes\\\" or \\\"no\\\" for inclusion.</p><p><strong>Results: </strong>From the initial search, 220 studies were identified, of which 16 case reports met the inclusion criteria. In assessing risk stratification for acute pericarditis, ChatGPT o1 correctly identified the condition in 10 of 16 cases (62.5%) in the differential diagnosis and in 8 of 16 cases (50.0%) as the provisional diagnosis. DeepThink-R1 identified it in 8 of 16 cases (50.0%) and 6 of 16 cases (37.5%), respectively. ChatGPT o1 demonstrated higher accuracy than DeepThink-R1 in identifying pericarditis.</p><p><strong>Conclusion: </strong>Further research with larger sample sizes and optimized prompt engineering is warranted to improve diagnostic accuracy, particularly in atypical presentations.</p>\",\"PeriodicalId\":23800,\"journal\":{\"name\":\"World Journal of Cardiology\",\"volume\":\"17 8\",\"pages\":\"110489\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12426987/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Journal of Cardiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4330/wjc.v17.i8.110489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Cardiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4330/wjc.v17.i8.110489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景:将复杂的大型语言模型(llm)集成到医疗保健中最近引起了极大的关注,因为它们能够利用深度学习技术处理大量数据集并生成上下文准确的、类似人类的响应。这些模型以前已应用于医学诊断,例如评估口腔病变。鉴于心包炎的高漏诊率,llm可能支持临床医生进行鉴别诊断,特别是在非典型病例中,风险分层和早期识别对于预防严重并发症(如缩窄性心包炎和心包填塞)至关重要。目的:比较LLMs作为风险分层工具辅助心包炎诊断的准确性。方法:使用关键词“心包炎”进行PubMed检索,对“病例报告”进行筛选。提取相关病例资料。纳入标准包括18岁或以上确诊为急性心包炎的患者的英文报告。通过评价心包炎是否被列入前三种鉴别诊断并作为唯一的临时诊断,来评价ChatGPT 1和DeepThink-R1的诊断能力。每个病例都被分类为“是”或“否”以便纳入。结果:从最初的检索中,确定了220项研究,其中16例病例报告符合纳入标准。在评估急性心包炎的风险分层时,ChatGPT 1在16例鉴别诊断中正确识别了10例(62.5%),在16例临时诊断中正确识别了8例(50.0%)。DeepThink-R1在16例中分别识别出8例(50.0%)和6例(37.5%)。ChatGPT 01识别心包炎的准确率高于DeepThink-R1。结论:进一步研究更大的样本量和优化的提示工程是必要的,以提高诊断的准确性,特别是在非典型的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of ChatGPT and DeepSeek large language models in the diagnosis of pericarditis.

Background: The integration of sophisticated large language models (LLMs) into healthcare has recently garnered significant attention due to their ability to leverage deep learning techniques to process vast datasets and generate contextually accurate, human-like responses. These models have been previously applied in medical diagnostics, such as in the evaluation of oral lesions. Given the high rate of missed diagnoses in pericarditis, LLMs may support clinicians in generating differential diagnoses-particularly in atypical cases where risk stratification and early identification are critical to preventing serious complications such as constrictive pericarditis and pericardial tamponade.

Aim: To compare the accuracy of LLMs in assisting the diagnosis of pericarditis as risk stratification tools.

Methods: A PubMed search was conducted using the keyword "pericarditis", applying filters for "case reports". Data from relevant cases were extracted. Inclusion criteria consisted of English-language reports involving patients aged 18 years or older with a confirmed diagnosis of acute pericarditis. The diagnostic capabilities of ChatGPT o1 and DeepThink-R1 were assessed by evaluating whether pericarditis was included in the top three differential diagnoses and as the sole provisional diagnosis. Each case was classified as either "yes" or "no" for inclusion.

Results: From the initial search, 220 studies were identified, of which 16 case reports met the inclusion criteria. In assessing risk stratification for acute pericarditis, ChatGPT o1 correctly identified the condition in 10 of 16 cases (62.5%) in the differential diagnosis and in 8 of 16 cases (50.0%) as the provisional diagnosis. DeepThink-R1 identified it in 8 of 16 cases (50.0%) and 6 of 16 cases (37.5%), respectively. ChatGPT o1 demonstrated higher accuracy than DeepThink-R1 in identifying pericarditis.

Conclusion: Further research with larger sample sizes and optimized prompt engineering is warranted to improve diagnostic accuracy, particularly in atypical presentations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
World Journal of Cardiology
World Journal of Cardiology CARDIAC & CARDIOVASCULAR SYSTEMS-
CiteScore
3.30
自引率
5.30%
发文量
54
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信