Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation.

世界移植杂志 Pub Date : 2025-09-18 DOI:10.5500/wjt.v15.i3.103536

Chrysanthos D Christou, Olga Sitsiani, Panagiotis Boutos, Georgios Katsanos, Georgios Papadakis, Anastasios Tefas, Vassilios Papalois, Georgios Tsoulfas

{"title":"Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation.","authors":"Chrysanthos D Christou, Olga Sitsiani, Panagiotis Boutos, Georgios Katsanos, Georgios Papadakis, Anastasios Tefas, Vassilios Papalois, Georgios Tsoulfas","doi":"10.5500/wjt.v15.i3.103536","DOIUrl":null,"url":null,"abstract":"Background: Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.Aim: To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.Methods: In total, 400 different questions tested ChatGPT's/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.Results: ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (P < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (P = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (P = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (P = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (P < 0.001).Conclusion: GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.","PeriodicalId":65557,"journal":{"name":"世界移植杂志","volume":"15 3","pages":"103536"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12038595/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"世界移植杂志","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5500/wjt.v15.i3.103536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.

Aim: To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.

Methods: In total, 400 different questions tested ChatGPT's/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.

Results: ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (P < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (P = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (P = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (P = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (P < 0.001).

Conclusion: GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.

查看原文本刊更多论文

ChatGPT-3.5和GPT-4作为人工智能辅助肾和肝移植临床实践的潜在工具的比较

背景：肾脏和肝脏移植是两个亚专科医学学科，移植专业人员花费数十年的时间进行培训。虽然基于人工智能（AI-based）的工具可能有助于日常临床实践，但对其在临床决策中的有效性的比较评估仍然有限。目的：比较ChatGPT和GPT-4在这些具有挑战性的学科中作为人工智能辅助临床实践的潜在工具的使用。方法：共400个不同的问题，测试ChatGPT /GPT-4对各种肾和肝移植概念的知识和决策能力。具体来说，294个选择题来源于开放获取资源，63个问题来源于已发表的开放获取病例报告，43个问题来源于未发表的在我科治疗的患者病例。评估涵盖了大量的主题，包括临床预测因素、治疗方案和诊断标准等。结果：ChatGPT对294道选择题的正确率为50.3%，而GPT-4的正确率更高，为70.7% （P < 0.001）。在已发表病例的63个问题中，ChatGPT的符合率为50.79%，部分符合率为17.46%，GPT-4的符合率为80.95%，部分符合率为9.52% （P = 0.01）。对于未发表病例的43个问题，ChatGPT的符合率为53.49%，部分符合率为23.26%，而GPT-4的符合率为72.09%，部分符合率为6.98% （P = 0.004）。当考虑到所有病例的任务性质时，值得注意的是，GPT-4表现出出色的表现，在90%的病例中提供了包括最终诊断的鉴别诊断（P = 0.008），并在100%的相关问题中成功预测了患者的预后（P < 0.001）。结论：与ChatGPT相比，GPT-4在肾和肝移植中始终提供更准确、可靠的临床建议，完全一致率更高。我们的研究结果支持ChatGPT和GPT-4等人工智能模型在人工智能辅助临床实践中的潜在效用，作为准确、个性化医疗信息和促进决策的来源。这种基于人工智能的工具的进步和完善可能会重塑临床实践的未来，使医生早期采用和适应它们成为必要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

世界移植杂志

CiteScore

3.50

自引率

0.00%

发文量

293