Huizi Li , Jiaobao Huang , Kuntang Liu , Jibiao Liu , Queling Liu , Zhiyong Zhou , Zhen Zong , Shengxun Mao
{"title":"ChatGPT-4o outperforms gemini advanced in assisting multidisciplinary decision-making for advanced gastric cancer","authors":"Huizi Li , Jiaobao Huang , Kuntang Liu , Jibiao Liu , Queling Liu , Zhiyong Zhou , Zhen Zong , Shengxun Mao","doi":"10.1016/j.ejso.2025.110096","DOIUrl":null,"url":null,"abstract":"<div><h3>Background & aims</h3><div>The treatment of advanced gastric cancer (GC) requires precise and comprehensive clinical decision-making. Artificial intelligence (AI) chatbots offer potential tools to enhance multidisciplinary team (MDT) discussions. This study aims to compare the performances of ChatGPT-4o and Gemini Advanced in generating treatment recommendations for advanced GC.</div></div><div><h3>Methods</h3><div>The study involved three steps: (1) evaluating responses to ten critical clinical questions, (2) analyzing clinical cases from MDT meetings at our institution, and (3) reviewing rare GC cases from PubMed. It included 95 advanced GC patients discussed between November 2022 and July 2024, and 14 rare cases from PubMed. Prompts designed from advanced GC cases were submitted to ChatGPT-4o and Gemini Advanced using a standardized format. Outputs were evaluated for accuracy and completeness using a structured 4-point Likert scale. Interrater reliability was calculated to ensure consistency among evaluators.</div></div><div><h3>Results</h3><div>For the ten clinical questions, ChatGPT-4o achieved better performances compared to Gemini Advanced. In MDT cases, ChatGPT-4o provided more valuable recommendations in surgical suggestion, chemotherapy recommendation, and chemotherapy regimens. Subgroup analysis confirmed these findings in both routine and complex cases with high interrater reliability. ChatGPT-4o also outperformed Gemini Advanced in the analysis of rare GC cases from PubMed, showing superior accuracy with high interrater reliability.</div></div><div><h3>Conclusions</h3><div>While our findings suggest that AI chatbots can generate clinically relevant and guideline-based treatment recommendations, their use in MDT decision-making should be viewed as supportive rather than autonomous. We emphasize that while AI chatbots have potential as decision-support tools, but they should be integrated only under expert supervision in a real-world clinical context.</div></div>","PeriodicalId":11522,"journal":{"name":"Ejso","volume":"51 8","pages":"Article 110096"},"PeriodicalIF":3.5000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ejso","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0748798325005244","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background & aims
The treatment of advanced gastric cancer (GC) requires precise and comprehensive clinical decision-making. Artificial intelligence (AI) chatbots offer potential tools to enhance multidisciplinary team (MDT) discussions. This study aims to compare the performances of ChatGPT-4o and Gemini Advanced in generating treatment recommendations for advanced GC.
Methods
The study involved three steps: (1) evaluating responses to ten critical clinical questions, (2) analyzing clinical cases from MDT meetings at our institution, and (3) reviewing rare GC cases from PubMed. It included 95 advanced GC patients discussed between November 2022 and July 2024, and 14 rare cases from PubMed. Prompts designed from advanced GC cases were submitted to ChatGPT-4o and Gemini Advanced using a standardized format. Outputs were evaluated for accuracy and completeness using a structured 4-point Likert scale. Interrater reliability was calculated to ensure consistency among evaluators.
Results
For the ten clinical questions, ChatGPT-4o achieved better performances compared to Gemini Advanced. In MDT cases, ChatGPT-4o provided more valuable recommendations in surgical suggestion, chemotherapy recommendation, and chemotherapy regimens. Subgroup analysis confirmed these findings in both routine and complex cases with high interrater reliability. ChatGPT-4o also outperformed Gemini Advanced in the analysis of rare GC cases from PubMed, showing superior accuracy with high interrater reliability.
Conclusions
While our findings suggest that AI chatbots can generate clinically relevant and guideline-based treatment recommendations, their use in MDT decision-making should be viewed as supportive rather than autonomous. We emphasize that while AI chatbots have potential as decision-support tools, but they should be integrated only under expert supervision in a real-world clinical context.
期刊介绍:
JSO - European Journal of Surgical Oncology ("the Journal of Cancer Surgery") is the Official Journal of the European Society of Surgical Oncology and BASO ~ the Association for Cancer Surgery.
The EJSO aims to advance surgical oncology research and practice through the publication of original research articles, review articles, editorials, debates and correspondence.