Wei-Kai Chuang MD , Yung-Shuo Kao MD , Yen-Ting Liu MD , Cho-Yin Lee MD, PhD
{"title":"利用开放式问题和图像评估ChatGPT在放射肿瘤学临床决策中的作用。","authors":"Wei-Kai Chuang MD , Yung-Shuo Kao MD , Yen-Ting Liu MD , Cho-Yin Lee MD, PhD","doi":"10.1016/j.prro.2025.04.009","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>This study assesses the practicality and correctness of Chat Generative Pre-trained Transformer (ChatGPT)-4 and 4O’s answers to clinical inquiries in radiation oncology<span>, and evaluates ChatGPT-4O for staging nasopharyngeal carcinoma (NPC) cases with magnetic resonance (MR) images.</span></div></div><div><h3>Methods and Materials</h3><div>A total of 164 open-ended questions covering representative professional domains (Clinical_G: knowledge on standardized guidelines; Clinical_C: complex clinical scenarios; Nursing: nursing and health education; and Technology: radiation technology and dosimetry) were prospectively formulated by experts and presented to ChatGPT-4 and 4O. Each ChatGPT’s answer was graded as 1 (Directly practical for clinical decision-making), 2 (Correct but inadequate), 3 (Mixed with correct and incorrect information), or 4 (Completely incorrect). ChatGPT-4O was presented with the representative diagnostic MR images of 20 patients with NPC across different T stages, and asked to determine the T stage of each case.</div></div><div><h3>Results</h3><div>The proportions of ChatGPT’s answers that were practical (grade 1) varied across professional domains (<em>P</em> < .01), higher in Nursing (GPT-4: 91.9%; GPT-4O: 94.6%) and Clinical_G (GPT-4: 82.2%; GPT-4O: 88.9%) domains than in Clinical_C (GPT-4: 54.1%; GPT-4O: 62.2%) and Technology (GPT-4: 64.4%; GPT-4O: 77.8%) domains. The proportions of correct (grade 1+2) answers (GPT-4: 89.6%; GPT-4O: 98.8%; <em>P</em> < .01) were universally high across all professional domains. However, ChatGPT-4O failed to stage NPC cases via MR images, indiscriminately assigning T4 to all actually non-T4 cases (<em>κ</em> = 0; 95% CI, −0.253 to 0.253).</div></div><div><h3>Conclusions</h3><div>ChatGPT could be a safe clinical decision-support tool in radiation oncology, because it correctly answered the vast majority of clinical inquiries across professional domains. However, its clinical practicality should be cautiously weighted particularly in the Clinical_C and Technology domains. ChatGPT-4O is not yet mature to interpret diagnostic images for cancer staging.</div></div>","PeriodicalId":54245,"journal":{"name":"Practical Radiation Oncology","volume":"15 5","pages":"Pages e412-e423"},"PeriodicalIF":3.5000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing ChatGPT for Clinical Decision-Making in Radiation Oncology, With Open-Ended Questions and Images\",\"authors\":\"Wei-Kai Chuang MD , Yung-Shuo Kao MD , Yen-Ting Liu MD , Cho-Yin Lee MD, PhD\",\"doi\":\"10.1016/j.prro.2025.04.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>This study assesses the practicality and correctness of Chat Generative Pre-trained Transformer (ChatGPT)-4 and 4O’s answers to clinical inquiries in radiation oncology<span>, and evaluates ChatGPT-4O for staging nasopharyngeal carcinoma (NPC) cases with magnetic resonance (MR) images.</span></div></div><div><h3>Methods and Materials</h3><div>A total of 164 open-ended questions covering representative professional domains (Clinical_G: knowledge on standardized guidelines; Clinical_C: complex clinical scenarios; Nursing: nursing and health education; and Technology: radiation technology and dosimetry) were prospectively formulated by experts and presented to ChatGPT-4 and 4O. Each ChatGPT’s answer was graded as 1 (Directly practical for clinical decision-making), 2 (Correct but inadequate), 3 (Mixed with correct and incorrect information), or 4 (Completely incorrect). ChatGPT-4O was presented with the representative diagnostic MR images of 20 patients with NPC across different T stages, and asked to determine the T stage of each case.</div></div><div><h3>Results</h3><div>The proportions of ChatGPT’s answers that were practical (grade 1) varied across professional domains (<em>P</em> < .01), higher in Nursing (GPT-4: 91.9%; GPT-4O: 94.6%) and Clinical_G (GPT-4: 82.2%; GPT-4O: 88.9%) domains than in Clinical_C (GPT-4: 54.1%; GPT-4O: 62.2%) and Technology (GPT-4: 64.4%; GPT-4O: 77.8%) domains. The proportions of correct (grade 1+2) answers (GPT-4: 89.6%; GPT-4O: 98.8%; <em>P</em> < .01) were universally high across all professional domains. However, ChatGPT-4O failed to stage NPC cases via MR images, indiscriminately assigning T4 to all actually non-T4 cases (<em>κ</em> = 0; 95% CI, −0.253 to 0.253).</div></div><div><h3>Conclusions</h3><div>ChatGPT could be a safe clinical decision-support tool in radiation oncology, because it correctly answered the vast majority of clinical inquiries across professional domains. However, its clinical practicality should be cautiously weighted particularly in the Clinical_C and Technology domains. ChatGPT-4O is not yet mature to interpret diagnostic images for cancer staging.</div></div>\",\"PeriodicalId\":54245,\"journal\":{\"name\":\"Practical Radiation Oncology\",\"volume\":\"15 5\",\"pages\":\"Pages e412-e423\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Practical Radiation Oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1879850025001158\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Radiation Oncology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1879850025001158","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Assessing ChatGPT for Clinical Decision-Making in Radiation Oncology, With Open-Ended Questions and Images
Purpose
This study assesses the practicality and correctness of Chat Generative Pre-trained Transformer (ChatGPT)-4 and 4O’s answers to clinical inquiries in radiation oncology, and evaluates ChatGPT-4O for staging nasopharyngeal carcinoma (NPC) cases with magnetic resonance (MR) images.
Methods and Materials
A total of 164 open-ended questions covering representative professional domains (Clinical_G: knowledge on standardized guidelines; Clinical_C: complex clinical scenarios; Nursing: nursing and health education; and Technology: radiation technology and dosimetry) were prospectively formulated by experts and presented to ChatGPT-4 and 4O. Each ChatGPT’s answer was graded as 1 (Directly practical for clinical decision-making), 2 (Correct but inadequate), 3 (Mixed with correct and incorrect information), or 4 (Completely incorrect). ChatGPT-4O was presented with the representative diagnostic MR images of 20 patients with NPC across different T stages, and asked to determine the T stage of each case.
Results
The proportions of ChatGPT’s answers that were practical (grade 1) varied across professional domains (P < .01), higher in Nursing (GPT-4: 91.9%; GPT-4O: 94.6%) and Clinical_G (GPT-4: 82.2%; GPT-4O: 88.9%) domains than in Clinical_C (GPT-4: 54.1%; GPT-4O: 62.2%) and Technology (GPT-4: 64.4%; GPT-4O: 77.8%) domains. The proportions of correct (grade 1+2) answers (GPT-4: 89.6%; GPT-4O: 98.8%; P < .01) were universally high across all professional domains. However, ChatGPT-4O failed to stage NPC cases via MR images, indiscriminately assigning T4 to all actually non-T4 cases (κ = 0; 95% CI, −0.253 to 0.253).
Conclusions
ChatGPT could be a safe clinical decision-support tool in radiation oncology, because it correctly answered the vast majority of clinical inquiries across professional domains. However, its clinical practicality should be cautiously weighted particularly in the Clinical_C and Technology domains. ChatGPT-4O is not yet mature to interpret diagnostic images for cancer staging.
期刊介绍:
The overarching mission of Practical Radiation Oncology is to improve the quality of radiation oncology practice. PRO''s purpose is to document the state of current practice, providing background for those in training and continuing education for practitioners, through discussion and illustration of new techniques, evaluation of current practices, and publication of case reports. PRO strives to provide its readers content that emphasizes knowledge "with a purpose." The content of PRO includes:
Original articles focusing on patient safety, quality measurement, or quality improvement initiatives
Original articles focusing on imaging, contouring, target delineation, simulation, treatment planning, immobilization, organ motion, and other practical issues
ASTRO guidelines, position papers, and consensus statements
Essays that highlight enriching personal experiences in caring for cancer patients and their families.