Chrysanthos D Christou, Olga Sitsiani, Panagiotis Boutos, Georgios Katsanos, Georgios Papadakis, Anastasios Tefas, Vassilios Papalois, Georgios Tsoulfas
{"title":"ChatGPT-3.5和GPT-4作为人工智能辅助肾和肝移植临床实践的潜在工具的比较","authors":"Chrysanthos D Christou, Olga Sitsiani, Panagiotis Boutos, Georgios Katsanos, Georgios Papadakis, Anastasios Tefas, Vassilios Papalois, Georgios Tsoulfas","doi":"10.5500/wjt.v15.i3.103536","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.</p><p><strong>Aim: </strong>To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.</p><p><strong>Methods: </strong>In total, 400 different questions tested ChatGPT's/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.</p><p><strong>Results: </strong>ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (<i>P</i> < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (<i>P</i> = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (<i>P</i> = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (<i>P</i> = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (<i>P</i> < 0.001).</p><p><strong>Conclusion: </strong>GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.</p>","PeriodicalId":65557,"journal":{"name":"世界移植杂志","volume":"15 3","pages":"103536"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12038595/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation.\",\"authors\":\"Chrysanthos D Christou, Olga Sitsiani, Panagiotis Boutos, Georgios Katsanos, Georgios Papadakis, Anastasios Tefas, Vassilios Papalois, Georgios Tsoulfas\",\"doi\":\"10.5500/wjt.v15.i3.103536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.</p><p><strong>Aim: </strong>To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.</p><p><strong>Methods: </strong>In total, 400 different questions tested ChatGPT's/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.</p><p><strong>Results: </strong>ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (<i>P</i> < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (<i>P</i> = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (<i>P</i> = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (<i>P</i> = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (<i>P</i> < 0.001).</p><p><strong>Conclusion: </strong>GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.</p>\",\"PeriodicalId\":65557,\"journal\":{\"name\":\"世界移植杂志\",\"volume\":\"15 3\",\"pages\":\"103536\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12038595/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"世界移植杂志\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5500/wjt.v15.i3.103536\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"世界移植杂志","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5500/wjt.v15.i3.103536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation.
Background: Kidney and liver transplantation are two sub-specialized medical disciplines, with transplant professionals spending decades in training. While artificial intelligence-based (AI-based) tools could potentially assist in everyday clinical practice, comparative assessment of their effectiveness in clinical decision-making remains limited.
Aim: To compare the use of ChatGPT and GPT-4 as potential tools in AI-assisted clinical practice in these challenging disciplines.
Methods: In total, 400 different questions tested ChatGPT's/GPT-4 knowledge and decision-making capacity in various renal and liver transplantation concepts. Specifically, 294 multiple-choice questions were derived from open-access sources, 63 questions were derived from published open-access case reports, and 43 from unpublished cases of patients treated at our department. The evaluation covered a plethora of topics, including clinical predictors, treatment options, and diagnostic criteria, among others.
Results: ChatGPT correctly answered 50.3% of the 294 multiple-choice questions, while GPT-4 demonstrated a higher performance, answering 70.7% of questions (P < 0.001). Regarding the 63 questions from published cases, ChatGPT achieved an agreement rate of 50.79% and partial agreement of 17.46%, while GPT-4 demonstrated an agreement rate of 80.95% and partial agreement of 9.52% (P = 0.01). Regarding the 43 questions from unpublished cases, ChatGPT demonstrated an agreement rate of 53.49% and partial agreement of 23.26%, while GPT-4 demonstrated an agreement rate of 72.09% and partial agreement of 6.98% (P = 0.004). When factoring by the nature of the task for all cases, notably, GPT-4 demonstrated outstanding performance, providing a differential diagnosis that included the final diagnosis in 90% of the cases (P = 0.008), and successfully predicting the prognosis of the patient in 100% of related questions (P < 0.001).
Conclusion: GPT-4 consistently provided more accurate and reliable clinical recommendations with higher percentages of full agreements both in renal and liver transplantation compared with ChatGPT. Our findings support the potential utility of AI models like ChatGPT and GPT-4 in AI-assisted clinical practice as sources of accurate, individualized medical information and facilitating decision-making. The progression and refinement of such AI-based tools could reshape the future of clinical practice, making their early adoption and adaptation by physicians a necessity.