Aisling Zeng, Jacqueline Steinke, Horea-Florin Bocse, Matteo De Pastena
{"title":"Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations.","authors":"Aisling Zeng, Jacqueline Steinke, Horea-Florin Bocse, Matteo De Pastena","doi":"10.3390/jcm14145101","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background/Objectives</b>: This study evaluates the performance of a large language model (lLm) in providing geographically tailored colorectal cancer screening and surveillance recommendations to gastrointestinal surgeons. <b>Methods</b>: Fifty-four patient cases, varying by age and family history, were developed based on colorectal cancer guidelines. Standardized prompts with predefined query terms were used to query ChatGPT-4.5 on 18 April 2025, from four locations: Canada, Italy, Romania, and the United Kingdom. Responses were classified as \"Correct,\" \"Partially Correct,\" or \"Incorrect\" based on clinical guidelines and expert recommendations for each country. Outcomes were analyzed using descriptive statistics. <b>Results</b>: ChatGPT provided recommendations on screening eligibility, test interpretation, the management of positive results, and surveillance intervals. Correct recommendations were given for 50.0% (27/54) of cases in Canada, 63.0% (34/54) of cases in Italy, 40.7% (22/54) of cases in Romania, and 55.6% (30/54) of cases in the United Kingdom. Queries in Italian yielded correct guidance for 64.8% (35/54) of cases, while Romanian queries were accurate for 40.7% (22/54) of cases. Notably, Romania and Italy lacked detailed guidelines for polyp management and post-test surveillance. A key finding was the inconsistency between ChatGPT-generated titles and corresponding recommendations, which may impact its reliability in clinical decision-making. <b>Conclusions</b>: ChatGPT-4.5's performance varies by country and language, highlighting inconsistencies in geographically tailored recommendations. This study highlights limitations associated with the training data cutoff and the potential biases introduced by model-generated responses. Healthcare professionals should recognize these limitations and the possible gaps in guideline availability, particularly for high-risk screening, polyp management, and surveillance in certain European countries.</p>","PeriodicalId":15533,"journal":{"name":"Journal of Clinical Medicine","volume":"14 14","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12294925/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/jcm14145101","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background/Objectives: This study evaluates the performance of a large language model (lLm) in providing geographically tailored colorectal cancer screening and surveillance recommendations to gastrointestinal surgeons. Methods: Fifty-four patient cases, varying by age and family history, were developed based on colorectal cancer guidelines. Standardized prompts with predefined query terms were used to query ChatGPT-4.5 on 18 April 2025, from four locations: Canada, Italy, Romania, and the United Kingdom. Responses were classified as "Correct," "Partially Correct," or "Incorrect" based on clinical guidelines and expert recommendations for each country. Outcomes were analyzed using descriptive statistics. Results: ChatGPT provided recommendations on screening eligibility, test interpretation, the management of positive results, and surveillance intervals. Correct recommendations were given for 50.0% (27/54) of cases in Canada, 63.0% (34/54) of cases in Italy, 40.7% (22/54) of cases in Romania, and 55.6% (30/54) of cases in the United Kingdom. Queries in Italian yielded correct guidance for 64.8% (35/54) of cases, while Romanian queries were accurate for 40.7% (22/54) of cases. Notably, Romania and Italy lacked detailed guidelines for polyp management and post-test surveillance. A key finding was the inconsistency between ChatGPT-generated titles and corresponding recommendations, which may impact its reliability in clinical decision-making. Conclusions: ChatGPT-4.5's performance varies by country and language, highlighting inconsistencies in geographically tailored recommendations. This study highlights limitations associated with the training data cutoff and the potential biases introduced by model-generated responses. Healthcare professionals should recognize these limitations and the possible gaps in guideline availability, particularly for high-risk screening, polyp management, and surveillance in certain European countries.
期刊介绍:
Journal of Clinical Medicine (ISSN 2077-0383), is an international scientific open access journal, providing a platform for advances in health care/clinical practices, the study of direct observation of patients and general medical research. This multi-disciplinary journal is aimed at a wide audience of medical researchers and healthcare professionals.
Unique features of this journal:
manuscripts regarding original research and ideas will be particularly welcomed.JCM also accepts reviews, communications, and short notes.
There is no limit to publication length: our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible.