Colorectal Cancer Prevention: Is Chat Generative Pretrained Transformer (Chat GPT) ready to Assist Physicians in Determining Appropriate Screening and Surveillance Recommendations?

IF 2.8 4区医学 Q2 GASTROENTEROLOGY & HEPATOLOGY

Journal of clinical gastroenterology Pub Date : 2024-11-01 Epub Date: 2024-02-07 DOI:10.1097/MCG.0000000000001979

Lisandro Pereyra, Francisco Schlottmann, Leandro Steinberg, Juan Lasa

{"title":"Colorectal Cancer Prevention: Is Chat Generative Pretrained Transformer (Chat GPT) ready to Assist Physicians in Determining Appropriate Screening and Surveillance Recommendations?","authors":"Lisandro Pereyra, Francisco Schlottmann, Leandro Steinberg, Juan Lasa","doi":"10.1097/MCG.0000000000001979","DOIUrl":null,"url":null,"abstract":"Objective: To determine whether a publicly available advanced language model could help determine appropriate colorectal cancer (CRC) screening and surveillance recommendations.Background: Poor physician knowledge or inability to accurately recall recommendations might affect adherence to CRC screening guidelines. Adoption of newer technologies can help improve the delivery of such preventive care services.Methods: An assessment with 10 multiple choice questions, including 5 CRC screening and 5 CRC surveillance clinical vignettes, was inputted into chat generative pretrained transformer (ChatGPT) 3.5 in 4 separate sessions. Responses were recorded and screened for accuracy to determine the reliability of this tool. The mean number of correct answers was then compared against a control group of gastroenterologists and colorectal surgeons answering the same questions with and without the help of a previously validated CRC screening mobile app.Results: The average overall performance of ChatGPT was 45%. The mean number of correct answers was 2.75 (95% CI: 2.26-3.24), 1.75 (95% CI: 1.26-2.24), and 4.5 (95% CI: 3.93-5.07) for screening, surveillance, and total questions, respectively. ChatGPT showed inconsistency and gave a different answer in 4 questions among the different sessions. A total of 238 physicians also responded to the assessment; 123 (51.7%) without and 115 (48.3%) with the mobile app. The mean number of total correct answers of ChatGPT was significantly lower than those of physicians without [5.62 (95% CI: 5.32-5.92)] and with the mobile app [7.71 (95% CI: 7.39-8.03); P < 0.001].Conclusions: Large language models developed with artificial intelligence require further refinements to serve as reliable assistants in clinical practice.","PeriodicalId":15457,"journal":{"name":"Journal of clinical gastroenterology","volume":" ","pages":"1022-1027"},"PeriodicalIF":2.8000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of clinical gastroenterology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/MCG.0000000000001979","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To determine whether a publicly available advanced language model could help determine appropriate colorectal cancer (CRC) screening and surveillance recommendations.

Background: Poor physician knowledge or inability to accurately recall recommendations might affect adherence to CRC screening guidelines. Adoption of newer technologies can help improve the delivery of such preventive care services.

Methods: An assessment with 10 multiple choice questions, including 5 CRC screening and 5 CRC surveillance clinical vignettes, was inputted into chat generative pretrained transformer (ChatGPT) 3.5 in 4 separate sessions. Responses were recorded and screened for accuracy to determine the reliability of this tool. The mean number of correct answers was then compared against a control group of gastroenterologists and colorectal surgeons answering the same questions with and without the help of a previously validated CRC screening mobile app.

Results: The average overall performance of ChatGPT was 45%. The mean number of correct answers was 2.75 (95% CI: 2.26-3.24), 1.75 (95% CI: 1.26-2.24), and 4.5 (95% CI: 3.93-5.07) for screening, surveillance, and total questions, respectively. ChatGPT showed inconsistency and gave a different answer in 4 questions among the different sessions. A total of 238 physicians also responded to the assessment; 123 (51.7%) without and 115 (48.3%) with the mobile app. The mean number of total correct answers of ChatGPT was significantly lower than those of physicians without [5.62 (95% CI: 5.32-5.92)] and with the mobile app [7.71 (95% CI: 7.39-8.03); P < 0.001].

Conclusions: Large language models developed with artificial intelligence require further refinements to serve as reliable assistants in clinical practice.

查看原文本刊更多论文

大肠癌预防：Chat Generative Pretrained Transformer (Chat GPT) 是否已准备好协助医生确定适当的筛查和监测建议？

目的：确定一个公开可用的高级语言模型是否有助于确定适当的大肠癌（CRC）筛查和监测建议：确定一个可公开获取的高级语言模型是否有助于确定适当的结直肠癌（CRC）筛查和监测建议：背景：医生知识贫乏或无法准确回忆建议可能会影响对 CRC 筛查指南的遵守。采用更新的技术有助于改善此类预防保健服务的提供：方法：将包含 10 道多项选择题（包括 5 个 CRC 筛查和 5 个 CRC 监测临床案例）的评估结果分 4 次输入聊天生成预训练转换器 (ChatGPT) 3.5。对回答进行了记录和准确性筛选，以确定该工具的可靠性。然后，将正确答案的平均数量与对照组胃肠病学家和结直肠外科医生在使用和未使用之前经过验证的 CRC 筛查移动应用程序的情况下回答相同问题的结果进行比较：结果：ChatGPT 的平均整体表现为 45%。筛查、监测和总问题的平均正确答案数分别为 2.75（95% CI：2.26-3.24）、1.75（95% CI：1.26-2.24）和 4.5（95% CI：3.93-5.07）。ChatGPT 显示出了不一致性，在不同的会议中对 4 个问题给出了不同的答案。共有 238 名医生回答了评估问题，其中 123 人（51.7%）未使用移动应用程序，115 人（48.3%）使用了移动应用程序。ChatGPT 的总正确答案平均数明显低于未使用移动应用程序的医生[5.62（95% CI：5.32-5.92）]和使用移动应用程序的医生[7.71（95% CI：7.39-8.03）；P <0.001]：利用人工智能开发的大型语言模型需要进一步完善，才能在临床实践中发挥可靠的辅助作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of clinical gastroenterology 医学-胃肠肝病学

CiteScore

5.60

自引率

3.40%

发文量

339

审稿时长

3-8 weeks

期刊介绍： Journal of Clinical Gastroenterology gathers the world''s latest, most relevant clinical studies and reviews, case reports, and technical expertise in a single source. Regular features include cutting-edge, peer-reviewed articles and clinical reviews that put the latest research and development into the context of your practice. Also included are biographies, focused organ reviews, practice management, and therapeutic recommendations.