Challenging cases of hyponatremia incorrectly interpreted by ChatGPT.

IF 2.7 2区 医学 Q1 EDUCATION & EDUCATIONAL RESEARCH
Kenrick Berend, Ashley Duits, Reinold O B Gans
{"title":"Challenging cases of hyponatremia incorrectly interpreted by ChatGPT.","authors":"Kenrick Berend, Ashley Duits, Reinold O B Gans","doi":"10.1186/s12909-025-07235-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In clinical medicine, the assessment of hyponatremia is frequently required but also known as a source of major diagnostic errors, substantial mismanagement, and iatrogenic morbidity. Because artificial intelligence techniques are efficient in analyzing complex problems, their use may possibly overcome current assessment limitations. There is no literature concerning Chat Generative Pre-trained Transformer (ChatGPT-3.5) use for evaluating difficult hyponatremia cases. Because of the interesting pathophysiology, hyponatremia cases are often used in medical education for students to evaluate patients with students increasingly using artificial intelligence as a diagnostic tool. To evaluate this possibility, four challenging hyponatremia cases published previously, were presented to the free ChatGPT-3.5 for diagnosis and treatment suggestions.</p><p><strong>Methods: </strong>We used four challenging hyponatremia cases, that were evaluated by 46 physicians in Canada, the Netherlands, South-Africa, Taiwan, and USA, and published previously. These four cases were presented two times in the free ChatGPT, version 3.5 in December 2023 as well as in September 2024 with the request to recommend diagnosis and therapy. Responses by ChatGPT were compared with those of the clinicians.</p><p><strong>Results: </strong>Case 1 and 3 have a single cause of hyponatremia. Case 2 and 4 have two contributing hyponatremia features. Neither ChatGPT, in 2023, nor the previously published assessment by 46 clinicians, whose assessment was described in the original publication, recognized the most crucial cause of hyponatremia with major therapeutic consequences in all four cases. In 2024 ChatGPT properly diagnosed and suggested adequate management in one case. Concurrent Addison's disease was correctly recognized in case 1 by ChatGPT in 2023 and 2024, whereas 81% of the clinicians missed this diagnosis. No proper therapeutic recommendations were given by ChatGPT in 2023 in any of the four cases, but in one case adequate advice was given by ChatGPT in 2024. The 46 clinicians recommended inadequate therapy in 65%, 57%, 2%, and 76%, respectively in case 1 to 4.</p><p><strong>Conclusion: </strong>Our study currently does not support the use of the free version ChatGPT 3.5 in difficult hyponatremia cases, but a small improvement was observed after ten months with the same ChatGPT 3.5 version. Patients, health professionals, medical educators and students should be aware of the shortcomings of diagnosis and therapy suggestions by ChatGPT.</p>","PeriodicalId":51234,"journal":{"name":"BMC Medical Education","volume":"25 1","pages":"751"},"PeriodicalIF":2.7000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12100905/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Education","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12909-025-07235-2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: In clinical medicine, the assessment of hyponatremia is frequently required but also known as a source of major diagnostic errors, substantial mismanagement, and iatrogenic morbidity. Because artificial intelligence techniques are efficient in analyzing complex problems, their use may possibly overcome current assessment limitations. There is no literature concerning Chat Generative Pre-trained Transformer (ChatGPT-3.5) use for evaluating difficult hyponatremia cases. Because of the interesting pathophysiology, hyponatremia cases are often used in medical education for students to evaluate patients with students increasingly using artificial intelligence as a diagnostic tool. To evaluate this possibility, four challenging hyponatremia cases published previously, were presented to the free ChatGPT-3.5 for diagnosis and treatment suggestions.

Methods: We used four challenging hyponatremia cases, that were evaluated by 46 physicians in Canada, the Netherlands, South-Africa, Taiwan, and USA, and published previously. These four cases were presented two times in the free ChatGPT, version 3.5 in December 2023 as well as in September 2024 with the request to recommend diagnosis and therapy. Responses by ChatGPT were compared with those of the clinicians.

Results: Case 1 and 3 have a single cause of hyponatremia. Case 2 and 4 have two contributing hyponatremia features. Neither ChatGPT, in 2023, nor the previously published assessment by 46 clinicians, whose assessment was described in the original publication, recognized the most crucial cause of hyponatremia with major therapeutic consequences in all four cases. In 2024 ChatGPT properly diagnosed and suggested adequate management in one case. Concurrent Addison's disease was correctly recognized in case 1 by ChatGPT in 2023 and 2024, whereas 81% of the clinicians missed this diagnosis. No proper therapeutic recommendations were given by ChatGPT in 2023 in any of the four cases, but in one case adequate advice was given by ChatGPT in 2024. The 46 clinicians recommended inadequate therapy in 65%, 57%, 2%, and 76%, respectively in case 1 to 4.

Conclusion: Our study currently does not support the use of the free version ChatGPT 3.5 in difficult hyponatremia cases, but a small improvement was observed after ten months with the same ChatGPT 3.5 version. Patients, health professionals, medical educators and students should be aware of the shortcomings of diagnosis and therapy suggestions by ChatGPT.

ChatGPT错误解读低钠血症的挑战案例。
背景:在临床医学中,经常需要对低钠血症进行评估,但也被认为是主要诊断错误、严重管理不善和医源性发病率的来源。由于人工智能技术在分析复杂问题方面是有效的,它们的使用可能会克服当前评估的局限性。没有关于聊天生成预训练变压器(ChatGPT-3.5)用于评估困难低钠血症病例的文献。由于其有趣的病理生理学,低钠血症病例经常在医学教育中用于学生评估患者,学生越来越多地使用人工智能作为诊断工具。为了评估这种可能性,我们将先前发表的4例具有挑战性的低钠血症病例提交给免费的ChatGPT-3.5,以获得诊断和治疗建议。方法:我们选取了4例具有挑战性的低钠血症病例,由加拿大、荷兰、南非、台湾和美国的46位医生评估,并已发表。这四个病例分别于2023年12月和2024年9月在免费的ChatGPT 3.5版中出现过两次,要求推荐诊断和治疗。将ChatGPT的反应与临床医生的反应进行比较。结果:病例1和病例3是低钠血症的单一病因。病例2和病例4有两种低钠血症特征。无论是2023年的ChatGPT,还是先前发表的46名临床医生的评估(其评估在原始出版物中描述),都没有认识到所有4例低钠血症的最关键原因和主要治疗后果。在2024年,ChatGPT正确诊断并建议适当的治疗1例。在2023年和2024年的病例1中,ChatGPT正确地识别了并发Addison病,而81%的临床医生错过了这一诊断。ChatGPT在2023年对这4例患者均未给出适当的治疗建议,但在2024年对其中1例患者给出了适当的建议。在病例1至病例4中,46名临床医生分别有65%、57%、2%和76%的患者建议不适当的治疗。结论:我们的研究目前不支持在困难的低钠血症病例中使用免费版ChatGPT 3.5,但使用相同的ChatGPT 3.5版本10个月后观察到轻微的改善。患者、卫生专业人员、医学教育工作者和学生应该意识到ChatGPT诊断和治疗建议的不足。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Medical Education
BMC Medical Education EDUCATION, SCIENTIFIC DISCIPLINES-
CiteScore
4.90
自引率
11.10%
发文量
795
审稿时长
6 months
期刊介绍: BMC Medical Education is an open access journal publishing original peer-reviewed research articles in relation to the training of healthcare professionals, including undergraduate, postgraduate, and continuing education. The journal has a special focus on curriculum development, evaluations of performance, assessment of training needs and evidence-based medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信