ChatGPT v4 outperforming v3.5 on cancer treatment recommendations in quality, clinical guideline, and expert opinion concordance.

IF 2.9 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES
DIGITAL HEALTH Pub Date : 2024-08-14 eCollection Date: 2024-01-01 DOI:10.1177/20552076241269538
Chung-You Tsai, Pai-Yu Cheng, Juinn-Horng Deng, Fu-Shan Jaw, Shyi-Chun Yii
{"title":"ChatGPT v4 outperforming v3.5 on cancer treatment recommendations in quality, clinical guideline, and expert opinion concordance.","authors":"Chung-You Tsai, Pai-Yu Cheng, Juinn-Horng Deng, Fu-Shan Jaw, Shyi-Chun Yii","doi":"10.1177/20552076241269538","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To assess the quality and alignment of ChatGPT's cancer treatment recommendations (RECs) with National Comprehensive Cancer Network (NCCN) guidelines and expert opinions.</p><p><strong>Methods: </strong>Three urologists performed quantitative and qualitative assessments in October 2023 analyzing responses from ChatGPT-4 and ChatGPT-3.5 to 108 prostate, kidney, and bladder cancer prompts using two zero-shot prompt templates. Performance evaluation involved calculating five ratios: expert-approved/expert-disagreed and NCCN-aligned RECs against total ChatGPT RECs plus coverage and adherence rates to NCCN. Experts rated the response's quality on a 1-5 scale considering correctness, comprehensiveness, specificity, and appropriateness.</p><p><strong>Results: </strong>ChatGPT-4 outperformed ChatGPT-3.5 in prostate cancer inquiries, with an average word count of 317.3 versus 124.4 (<i>p</i> < 0.001) and 6.1 versus 3.9 RECs (<i>p</i> < 0.001). Its rater-approved REC ratio (96.1% vs. 89.4%) and alignment with NCCN guidelines (76.8% vs. 49.1%, <i>p</i> = 0.001) were superior and scored significantly better on all quality dimensions. Across 108 prompts covering three cancers, ChatGPT-4 produced an average of 6.0 RECs per case, with an 88.5% approval rate from raters, 86.7% NCCN concordance, and only a 9.5% disagreement rate. It achieved high marks in correctness (4.5), comprehensiveness (4.4), specificity (4.0), and appropriateness (4.4). Subgroup analyses across cancer types, disease statuses, and different prompt templates were reported.</p><p><strong>Conclusions: </strong>ChatGPT-4 demonstrated significant improvement in providing accurate and detailed treatment recommendations for urological cancers in line with clinical guidelines and expert opinion. However, it is vital to recognize that AI tools are not without flaws and should be utilized with caution. ChatGPT could supplement, but not replace, personalized advice from healthcare professionals.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11325467/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241269538","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: To assess the quality and alignment of ChatGPT's cancer treatment recommendations (RECs) with National Comprehensive Cancer Network (NCCN) guidelines and expert opinions.

Methods: Three urologists performed quantitative and qualitative assessments in October 2023 analyzing responses from ChatGPT-4 and ChatGPT-3.5 to 108 prostate, kidney, and bladder cancer prompts using two zero-shot prompt templates. Performance evaluation involved calculating five ratios: expert-approved/expert-disagreed and NCCN-aligned RECs against total ChatGPT RECs plus coverage and adherence rates to NCCN. Experts rated the response's quality on a 1-5 scale considering correctness, comprehensiveness, specificity, and appropriateness.

Results: ChatGPT-4 outperformed ChatGPT-3.5 in prostate cancer inquiries, with an average word count of 317.3 versus 124.4 (p < 0.001) and 6.1 versus 3.9 RECs (p < 0.001). Its rater-approved REC ratio (96.1% vs. 89.4%) and alignment with NCCN guidelines (76.8% vs. 49.1%, p = 0.001) were superior and scored significantly better on all quality dimensions. Across 108 prompts covering three cancers, ChatGPT-4 produced an average of 6.0 RECs per case, with an 88.5% approval rate from raters, 86.7% NCCN concordance, and only a 9.5% disagreement rate. It achieved high marks in correctness (4.5), comprehensiveness (4.4), specificity (4.0), and appropriateness (4.4). Subgroup analyses across cancer types, disease statuses, and different prompt templates were reported.

Conclusions: ChatGPT-4 demonstrated significant improvement in providing accurate and detailed treatment recommendations for urological cancers in line with clinical guidelines and expert opinion. However, it is vital to recognize that AI tools are not without flaws and should be utilized with caution. ChatGPT could supplement, but not replace, personalized advice from healthcare professionals.

在癌症治疗建议的质量、临床指南和专家意见的一致性方面,ChatGPT v4 优于 v3.5。
目的评估 ChatGPT 的癌症治疗建议 (REC) 的质量以及与美国国家综合癌症网络 (NCCN) 指南和专家意见的一致性:三位泌尿科专家于 2023 年 10 月进行了定量和定性评估,分析了 ChatGPT-4 和 ChatGPT-3.5 对 108 个前列腺癌、肾癌和膀胱癌提示的回复,并使用了两个零点提示模板。绩效评估包括计算五个比率:专家批准/专家不同意和与 NCCN 一致的 REC 与 ChatGPT REC 总数的比率,以及覆盖率和 NCCN 的遵守率。专家们根据回答的正确性、全面性、特异性和适当性,对回答的质量进行 1-5 级评分:结果:ChatGPT-4 在前列腺癌咨询中的表现优于 ChatGPT-3.5,平均字数为 317.3 对 124.4(p p = 0.001),在所有质量维度上都更胜一筹,得分明显更高。在涵盖三种癌症的 108 条提示中,ChatGPT-4 平均每个病例产生 6.0 个 REC,评分者的认可率为 88.5%,NCCN 一致率为 86.7%,不一致率仅为 9.5%。其正确性(4.5 分)、全面性(4.4 分)、特异性(4.0 分)和适当性(4.4 分)均获得高分。报告对不同癌症类型、疾病状态和不同提示模板进行了分组分析:ChatGPT-4在根据临床指南和专家意见提供准确、详细的泌尿系统癌症治疗建议方面取得了显著进步。然而,我们必须认识到,人工智能工具并非没有缺陷,应谨慎使用。ChatGPT 可以补充而非取代医护人员的个性化建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
DIGITAL HEALTH
DIGITAL HEALTH Multiple-
CiteScore
2.90
自引率
7.70%
发文量
302
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信