Accuracy, readability, and understandability of large language models for prostate cancer information to the public.

IF 5.1 2区 医学 Q1 ONCOLOGY
Jacob S Hershenhouse, Daniel Mokhtar, Michael B Eppler, Severin Rodler, Lorenzo Storino Ramacciotti, Conner Ganjavi, Brian Hom, Ryan J Davis, John Tran, Giorgio Ivan Russo, Andrea Cocci, Andre Abreu, Inderbir Gill, Mihir Desai, Giovanni E Cacciamani
{"title":"Accuracy, readability, and understandability of large language models for prostate cancer information to the public.","authors":"Jacob S Hershenhouse, Daniel Mokhtar, Michael B Eppler, Severin Rodler, Lorenzo Storino Ramacciotti, Conner Ganjavi, Brian Hom, Ryan J Davis, John Tran, Giorgio Ivan Russo, Andrea Cocci, Andre Abreu, Inderbir Gill, Mihir Desai, Giovanni E Cacciamani","doi":"10.1038/s41391-024-00826-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption.</p><p><strong>Methods: </strong>Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question.</p><p><strong>Results: </strong>GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%).</p><p><strong>Conclusion: </strong>GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.</p>","PeriodicalId":20727,"journal":{"name":"Prostate Cancer and Prostatic Diseases","volume":" ","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prostate Cancer and Prostatic Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41391-024-00826-y","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption.

Methods: Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question.

Results: GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%).

Conclusion: GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.

Abstract Image

向公众提供前列腺癌信息的大型语言模型的准确性、可读性和可理解性。
背景介绍自 ChatGPT 公开发布以来,生成预训练模型(GPT)聊天机器人越来越受欢迎。研究评估了不同 GPT 模型提供医疗信息的能力。迄今为止,还没有研究从医生和公众的角度评估过 ChatGPT 输出前列腺癌相关问题的质量,同时优化输出供患者使用:方法:通过谷歌趋势(全球)确定了九个前列腺癌相关问题,并将其分为诊断、治疗和术后随访三类。使用 ChatGPT 3.5 处理这些问题并记录回复。随后,这些回答被重新输入 ChatGPT,以创建六年级水平即可理解的简化摘要。原始 ChatGPT 回复和非专业人士摘要的可读性均使用经过验证的可读性工具进行了评估。我们在泌尿科医生(泌尿科医生和正在接受培训的泌尿科医生)中进行了一项调查,使用 5 点李克特量表对原始 ChatGPT 回复的准确性、完整性和清晰度进行评分。此外,两名独立审查员还对非专业人员摘要的正确性三要素进行了评估:准确性、完整性和决策充分性。公众对简化摘要的清晰度和可理解性的评估是通过亚马逊机械土耳其人(MTurk)进行的。参与者对清晰度进行评分,并通过多选题证明自己的理解能力:在 9 个场景中,71.7% 至 94.3% 的评分者(36 名泌尿科医生、17 名泌尿科住院医师)认为 GPT 生成的输出正确。GPT 生成的简化非专业人士摘要在 9 个场景中的 8 个(88.9%)被评为准确,在 9 个场景中的 8 个(88.9%)被评为足以让患者做出决定。非专业人士摘要的平均可读性高于原始 GPT 输出([原始 ChatGPT v. 简化 ChatGPT,平均值(标清),p 值] Flesch 阅读容易度:36.5(9.1) v. 70.2(11.2),结论:GPT 显示了针对前列腺癌相关内容对患者进行正确教育的前景,但该技术并非为向患者提供信息而设计。在使用 GPT 驱动的医疗聊天机器人时,促使模型做出准确、完整、清晰和可读的回应可能会提高其实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Prostate Cancer and Prostatic Diseases
Prostate Cancer and Prostatic Diseases 医学-泌尿学与肾脏学
CiteScore
10.00
自引率
6.20%
发文量
142
审稿时长
6-12 weeks
期刊介绍: Prostate Cancer and Prostatic Diseases covers all aspects of prostatic diseases, in particular prostate cancer, the subject of intensive basic and clinical research world-wide. The journal also reports on exciting new developments being made in diagnosis, surgery, radiotherapy, drug discovery and medical management. Prostate Cancer and Prostatic Diseases is of interest to surgeons, oncologists and clinicians treating patients and to those involved in research into diseases of the prostate. The journal covers the three main areas - prostate cancer, male LUTS and prostatitis. Prostate Cancer and Prostatic Diseases publishes original research articles, reviews, topical comment and critical appraisals of scientific meetings and the latest books. The journal also contains a calendar of forthcoming scientific meetings. The Editors and a distinguished Editorial Board ensure that submitted articles receive fast and efficient attention and are refereed to the highest possible scientific standard. A fast track system is available for topical articles of particular significance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信