AI at the Forefront: Navigating Oncologic Care for Six Gastrointestinal Cancers According to the NCCN Guidelines Utilizing Gemini-1.0 Ultra and ChatGPT-4.
Tamir E Bresler, Tyler Wilson, Tadevos Makaryan, Shivam Pandya, Kevin Palmer, Ryan Meyer, Zin M Htway, Manabu Fujita
{"title":"AI at the Forefront: Navigating Oncologic Care for Six Gastrointestinal Cancers According to the NCCN Guidelines Utilizing Gemini-1.0 Ultra and ChatGPT-4.","authors":"Tamir E Bresler, Tyler Wilson, Tadevos Makaryan, Shivam Pandya, Kevin Palmer, Ryan Meyer, Zin M Htway, Manabu Fujita","doi":"10.1002/jso.70005","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>We explored the ability of large language models (LLMs) ChatGPT-4 and Gemini 1.0 Ultra in guiding clinical decision-making for six gastrointestinal cancers using the National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines.</p><p><strong>Methods: </strong>We reviewed the NCCN Guidelines for anal squamous cell carcinoma, small bowel, ampullary, and pancreatic adenocarcinoma, and biliary tract and gastric cancers. Clinical questions were designed and categorized by type, queried up to three times, and rated on a Likert scale: (5) Correct; (4) Correct following clarification; (3) Correct but incomplete; (2) Partially incorrect; (1) Absolutely incorrect. Subgroup analysis was conducted on Correctness (scores 3-5) and Accuracy (scores 4-5).</p><p><strong>Results: </strong>A total of 270 questions were generated (range-per-cancer 32-68). ChatGPT-4 versus Gemini 1.0 Ultra score differences were not statistically-significant (Mean Rank 278.30 vs. 262.70, p = 0.222). Correctness was seen in 77.78% versus 75.93% of responses, and Accuracy in 64.81% versus 57.41%. There were no statistically-significant differences in Correctness or Accuracy between LLMs in terms of question or cancer type.</p><p><strong>Conclusions: </strong>Both LLMs demonstrated a limited capacity to assist with complex clinical decision-making. Their current Accuracy level falls below the acceptable threshold for clinical use. Future studies exploring LLMs in the healthcare domain are warranted.</p>","PeriodicalId":17111,"journal":{"name":"Journal of Surgical Oncology","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/jso.70005","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background and objectives: We explored the ability of large language models (LLMs) ChatGPT-4 and Gemini 1.0 Ultra in guiding clinical decision-making for six gastrointestinal cancers using the National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines.
Methods: We reviewed the NCCN Guidelines for anal squamous cell carcinoma, small bowel, ampullary, and pancreatic adenocarcinoma, and biliary tract and gastric cancers. Clinical questions were designed and categorized by type, queried up to three times, and rated on a Likert scale: (5) Correct; (4) Correct following clarification; (3) Correct but incomplete; (2) Partially incorrect; (1) Absolutely incorrect. Subgroup analysis was conducted on Correctness (scores 3-5) and Accuracy (scores 4-5).
Results: A total of 270 questions were generated (range-per-cancer 32-68). ChatGPT-4 versus Gemini 1.0 Ultra score differences were not statistically-significant (Mean Rank 278.30 vs. 262.70, p = 0.222). Correctness was seen in 77.78% versus 75.93% of responses, and Accuracy in 64.81% versus 57.41%. There were no statistically-significant differences in Correctness or Accuracy between LLMs in terms of question or cancer type.
Conclusions: Both LLMs demonstrated a limited capacity to assist with complex clinical decision-making. Their current Accuracy level falls below the acceptable threshold for clinical use. Future studies exploring LLMs in the healthcare domain are warranted.
期刊介绍:
The Journal of Surgical Oncology offers peer-reviewed, original papers in the field of surgical oncology and broadly related surgical sciences, including reports on experimental and laboratory studies. As an international journal, the editors encourage participation from leading surgeons around the world. The JSO is the representative journal for the World Federation of Surgical Oncology Societies. Publishing 16 issues in 2 volumes each year, the journal accepts Research Articles, in-depth Reviews of timely interest, Letters to the Editor, and invited Editorials. Guest Editors from the JSO Editorial Board oversee multiple special Seminars issues each year. These Seminars include multifaceted Reviews on a particular topic or current issue in surgical oncology, which are invited from experts in the field.