Sherif Ramadan, Adam Mutsaers, Po-Hsuan Cameron Chen, Glenn Bauman, Vikram Velker, Belal Ahmad, Andrew J Arifin, Timothy K Nguyen, David Palma, Christopher D Goodman
{"title":"Evaluating ChatGPT's competency in radiation oncology: A comprehensive assessment across clinical scenarios.","authors":"Sherif Ramadan, Adam Mutsaers, Po-Hsuan Cameron Chen, Glenn Bauman, Vikram Velker, Belal Ahmad, Andrew J Arifin, Timothy K Nguyen, David Palma, Christopher D Goodman","doi":"10.1016/j.radonc.2024.110645","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Artificial intelligence (AI) and machine learning present an opportunity to enhance clinical decision-making in radiation oncology. This study aims to evaluate the competency of ChatGPT, an AI language model, in interpreting clinical scenarios and assessing its oncology knowledge.</p><p><strong>Methods and materials: </strong>A series of clinical cases were designed covering 12 disease sites. Questions were grouped into domains: epidemiology, staging and workup, clinical management, treatment planning, cancer biology, physics, and surveillance. Royal College-certified radiation oncologists (ROs) reviewed cases and provided solutions. ROs scored responses on 3 criteria: conciseness (focused answers), completeness (addressing all aspects of the question), and correctness (answer aligns with expert opinion) using a standardized rubric. Scores ranged from 0 to 5 for each criterion for a total possible score of 15.</p><p><strong>Results: </strong>Across 12 cases, 182 questions were answered with a total AI score of 2317/2730 (84 %). Scores by criteria were: completeness (79 %, range: 70-99 %), conciseness (92 %, range: 83-99 %), and correctness (81 %, range: 72-92 %). AI performed best in the domains of epidemiology (93 %) and cancer biology (93 %) and reasonably in staging and workup (89 %), physics (86 %) and surveillance (82 %). Weaker domains included treatment planning (78 %) and clinical management (81 %). Statistical differences were driven by variations in the completeness (p < 0.01) and correctness (p = 0.04) criteria, whereas conciseness scored universally high (p = 0.91). These trends were consistent across disease sites.</p><p><strong>Conclusions: </strong>ChatGPT showed potential as a tool in radiation oncology, demonstrating a high degree of accuracy in several oncologic domains. However, this study highlights limitations with incorrect and incomplete answers in complex cases.</p>","PeriodicalId":21041,"journal":{"name":"Radiotherapy and Oncology","volume":" ","pages":"110645"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiotherapy and Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.radonc.2024.110645","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Artificial intelligence (AI) and machine learning present an opportunity to enhance clinical decision-making in radiation oncology. This study aims to evaluate the competency of ChatGPT, an AI language model, in interpreting clinical scenarios and assessing its oncology knowledge.
Methods and materials: A series of clinical cases were designed covering 12 disease sites. Questions were grouped into domains: epidemiology, staging and workup, clinical management, treatment planning, cancer biology, physics, and surveillance. Royal College-certified radiation oncologists (ROs) reviewed cases and provided solutions. ROs scored responses on 3 criteria: conciseness (focused answers), completeness (addressing all aspects of the question), and correctness (answer aligns with expert opinion) using a standardized rubric. Scores ranged from 0 to 5 for each criterion for a total possible score of 15.
Results: Across 12 cases, 182 questions were answered with a total AI score of 2317/2730 (84 %). Scores by criteria were: completeness (79 %, range: 70-99 %), conciseness (92 %, range: 83-99 %), and correctness (81 %, range: 72-92 %). AI performed best in the domains of epidemiology (93 %) and cancer biology (93 %) and reasonably in staging and workup (89 %), physics (86 %) and surveillance (82 %). Weaker domains included treatment planning (78 %) and clinical management (81 %). Statistical differences were driven by variations in the completeness (p < 0.01) and correctness (p = 0.04) criteria, whereas conciseness scored universally high (p = 0.91). These trends were consistent across disease sites.
Conclusions: ChatGPT showed potential as a tool in radiation oncology, demonstrating a high degree of accuracy in several oncologic domains. However, this study highlights limitations with incorrect and incomplete answers in complex cases.
期刊介绍:
Radiotherapy and Oncology publishes papers describing original research as well as review articles. It covers areas of interest relating to radiation oncology. This includes: clinical radiotherapy, combined modality treatment, translational studies, epidemiological outcomes, imaging, dosimetry, and radiation therapy planning, experimental work in radiobiology, chemobiology, hyperthermia and tumour biology, as well as data science in radiation oncology and physics aspects relevant to oncology.Papers on more general aspects of interest to the radiation oncologist including chemotherapy, surgery and immunology are also published.