AI at the Forefront: Navigating Oncologic Care for Six Gastrointestinal Cancers According to the NCCN Guidelines Utilizing Gemini-1.0 Ultra and ChatGPT-4

IF 1.9 3区医学 Q3 ONCOLOGY

Journal of Surgical Oncology Pub Date : 2025-06-19 DOI:10.1002/jso.70005

Tamir E. Bresler, Tyler Wilson, Tadevos Makaryan, Shivam Pandya, Kevin Palmer, Ryan Meyer, Zin M. Htway, Manabu Fujita

{"title":"AI at the Forefront: Navigating Oncologic Care for Six Gastrointestinal Cancers According to the NCCN Guidelines Utilizing Gemini-1.0 Ultra and ChatGPT-4","authors":"Tamir E. Bresler, Tyler Wilson, Tadevos Makaryan, Shivam Pandya, Kevin Palmer, Ryan Meyer, Zin M. Htway, Manabu Fujita","doi":"10.1002/jso.70005","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background and Objectives</h3>\n \n <p>We explored the ability of large language models (LLMs) ChatGPT-4 and Gemini 1.0 Ultra in guiding clinical decision-making for six gastrointestinal cancers using the National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We reviewed the NCCN Guidelines for anal squamous cell carcinoma, small bowel, ampullary, and pancreatic adenocarcinoma, and biliary tract and gastric cancers. Clinical questions were designed and categorized by type, queried up to three times, and rated on a Likert scale: (5) Correct; (4) Correct following clarification; (3) Correct but incomplete; (2) Partially incorrect; (1) Absolutely incorrect. Subgroup analysis was conducted on <i>Correctness</i> (scores 3–5) and <i>Accuracy</i> (scores 4–5).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 270 questions were generated (range-per-cancer 32–68). ChatGPT-4 versus Gemini 1.0 Ultra score differences were not statistically-significant (Mean Rank 278.30 vs. 262.70, <i>p</i> = 0.222). <i>Correctness</i> was seen in 77.78% versus 75.93% of responses, and <i>Accuracy</i> in 64.81% versus 57.41%. There were no statistically-significant differences in <i>Correctness</i> or <i>Accuracy</i> between LLMs in terms of question or cancer type.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Both LLMs demonstrated a limited capacity to assist with complex clinical decision-making. Their current <i>Accuracy</i> level falls below the acceptable threshold for clinical use. Future studies exploring LLMs in the healthcare domain are warranted.</p>\n </section>\n </div>","PeriodicalId":17111,"journal":{"name":"Journal of Surgical Oncology","volume":"132 2","pages":"317-322"},"PeriodicalIF":1.9000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Oncology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jso.70005","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objectives

We explored the ability of large language models (LLMs) ChatGPT-4 and Gemini 1.0 Ultra in guiding clinical decision-making for six gastrointestinal cancers using the National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines.

Methods

We reviewed the NCCN Guidelines for anal squamous cell carcinoma, small bowel, ampullary, and pancreatic adenocarcinoma, and biliary tract and gastric cancers. Clinical questions were designed and categorized by type, queried up to three times, and rated on a Likert scale: (5) Correct; (4) Correct following clarification; (3) Correct but incomplete; (2) Partially incorrect; (1) Absolutely incorrect. Subgroup analysis was conducted on Correctness (scores 3–5) and Accuracy (scores 4–5).

Results

A total of 270 questions were generated (range-per-cancer 32–68). ChatGPT-4 versus Gemini 1.0 Ultra score differences were not statistically-significant (Mean Rank 278.30 vs. 262.70, p = 0.222). Correctness was seen in 77.78% versus 75.93% of responses, and Accuracy in 64.81% versus 57.41%. There were no statistically-significant differences in Correctness or Accuracy between LLMs in terms of question or cancer type.

Conclusions

Both LLMs demonstrated a limited capacity to assist with complex clinical decision-making. Their current Accuracy level falls below the acceptable threshold for clinical use. Future studies exploring LLMs in the healthcare domain are warranted.

查看原文本刊更多论文

最前沿的人工智能：根据NCCN指南，利用Gemini-1.0 Ultra和ChatGPT-4为六种胃肠道癌症导航肿瘤护理

背景和目的：我们根据国家综合癌症网络（NCCN）临床实践指南，探讨了大型语言模型（LLMs） ChatGPT-4和Gemini 1.0 Ultra在指导六种胃肠道癌症临床决策方面的能力。方法：我们回顾了NCCN关于肛门鳞状细胞癌、小肠、壶腹癌、胰腺腺癌、胆道癌和胃癌的指南。临床问题按类型设计和分类，最多查询三次，并按李克特量表评分：(5)正确；(4)纠正后续澄清；(3)正确但不完整的；(2)部分错误；(1)完全错误。对正确性（3-5分）和准确性（4-5分）进行亚组分析。结果：共生成了270个问题（范围-每个癌症32-68）。ChatGPT-4与Gemini 1.0 Ultra评分差异无统计学意义（平均排名278.30比262.70,p = 0.222）。正确率分别为77.78%和75.93%，准确率分别为64.81%和57.41%。在问题或癌症类型方面，法学硕士之间的正确性或准确性没有统计学上的显著差异。结论：这两个llm在复杂的临床决策方面表现出有限的能力。他们目前的准确度水平低于临床使用的可接受阈值。未来的研究探索法学硕士在医疗保健领域是必要的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Surgical Oncology 医学-外科

CiteScore

4.70

自引率

4.00%

发文量

367

审稿时长

2 months

期刊介绍： The Journal of Surgical Oncology offers peer-reviewed, original papers in the field of surgical oncology and broadly related surgical sciences, including reports on experimental and laboratory studies. As an international journal, the editors encourage participation from leading surgeons around the world. The JSO is the representative journal for the World Federation of Surgical Oncology Societies. Publishing 16 issues in 2 volumes each year, the journal accepts Research Articles, in-depth Reviews of timely interest, Letters to the Editor, and invited Editorials. Guest Editors from the JSO Editorial Board oversee multiple special Seminars issues each year. These Seminars include multifaceted Reviews on a particular topic or current issue in surgical oncology, which are invited from experts in the field.