Accuracy, readability, and understandability of large language models for prostate cancer information to the public.

IF 5.8 2区医学 Q1 ONCOLOGY

Prostate Cancer and Prostatic Diseases Pub Date : 2025-06-01 Epub Date: 2024-05-14 DOI:10.1038/s41391-024-00826-y

Jacob S Hershenhouse, Daniel Mokhtar, Michael B Eppler, Severin Rodler, Lorenzo Storino Ramacciotti, Conner Ganjavi, Brian Hom, Ryan J Davis, John Tran, Giorgio Ivan Russo, Andrea Cocci, Andre Abreu, Inderbir Gill, Mihir Desai, Giovanni E Cacciamani

{"title":"Accuracy, readability, and understandability of large language models for prostate cancer information to the public.","authors":"Jacob S Hershenhouse, Daniel Mokhtar, Michael B Eppler, Severin Rodler, Lorenzo Storino Ramacciotti, Conner Ganjavi, Brian Hom, Ryan J Davis, John Tran, Giorgio Ivan Russo, Andrea Cocci, Andre Abreu, Inderbir Gill, Mihir Desai, Giovanni E Cacciamani","doi":"10.1038/s41391-024-00826-y","DOIUrl":null,"url":null,"abstract":"Background: Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption.Methods: Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question.Results: GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%).Conclusion: GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.","PeriodicalId":20727,"journal":{"name":"Prostate Cancer and Prostatic Diseases","volume":" ","pages":"394-399"},"PeriodicalIF":5.8000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12106072/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prostate Cancer and Prostatic Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41391-024-00826-y","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Generative Pretrained Model (GPT) chatbots have gained popularity since the public release of ChatGPT. Studies have evaluated the ability of different GPT models to provide information about medical conditions. To date, no study has assessed the quality of ChatGPT outputs to prostate cancer related questions from both the physician and public perspective while optimizing outputs for patient consumption.

Methods: Nine prostate cancer-related questions, identified through Google Trends (Global), were categorized into diagnosis, treatment, and postoperative follow-up. These questions were processed using ChatGPT 3.5, and the responses were recorded. Subsequently, these responses were re-inputted into ChatGPT to create simplified summaries understandable at a sixth-grade level. Readability of both the original ChatGPT responses and the layperson summaries was evaluated using validated readability tools. A survey was conducted among urology providers (urologists and urologists in training) to rate the original ChatGPT responses for accuracy, completeness, and clarity using a 5-point Likert scale. Furthermore, two independent reviewers evaluated the layperson summaries on correctness trifecta: accuracy, completeness, and decision-making sufficiency. Public assessment of the simplified summaries' clarity and understandability was carried out through Amazon Mechanical Turk (MTurk). Participants rated the clarity and demonstrated their understanding through a multiple-choice question.

Results: GPT-generated output was deemed correct by 71.7% to 94.3% of raters (36 urologists, 17 urology residents) across 9 scenarios. GPT-generated simplified layperson summaries of this output was rated as accurate in 8 of 9 (88.9%) scenarios and sufficient for a patient to make a decision in 8 of 9 (88.9%) scenarios. Mean readability of layperson summaries was higher than original GPT outputs ([original ChatGPT v. simplified ChatGPT, mean (SD), p-value] Flesch Reading Ease: 36.5(9.1) v. 70.2(11.2), <0.0001; Gunning Fog: 15.8(1.7) v. 9.5(2.0), p < 0.0001; Flesch Grade Level: 12.8(1.2) v. 7.4(1.7), p < 0.0001; Coleman Liau: 13.7(2.1) v. 8.6(2.4), 0.0002; Smog index: 11.8(1.2) v. 6.7(1.8), <0.0001; Automated Readability Index: 13.1(1.4) v. 7.5(2.1), p < 0.0001). MTurk workers (n = 514) rated the layperson summaries as correct (89.5-95.7%) and correctly understood the content (63.0-87.4%).

Conclusion: GPT shows promise for correct patient education for prostate cancer-related contents, but the technology is not designed for delivering patients information. Prompting the model to respond with accuracy, completeness, clarity and readability may enhance its utility when used for GPT-powered medical chatbots.

Abstract Image

查看原文本刊更多论文

向公众提供前列腺癌信息的大型语言模型的准确性、可读性和可理解性。

背景介绍自 ChatGPT 公开发布以来，生成预训练模型（GPT）聊天机器人越来越受欢迎。研究评估了不同 GPT 模型提供医疗信息的能力。迄今为止，还没有研究从医生和公众的角度评估过 ChatGPT 输出前列腺癌相关问题的质量，同时优化输出供患者使用：方法：通过谷歌趋势（全球）确定了九个前列腺癌相关问题，并将其分为诊断、治疗和术后随访三类。使用 ChatGPT 3.5 处理这些问题并记录回复。随后，这些回答被重新输入 ChatGPT，以创建六年级水平即可理解的简化摘要。原始 ChatGPT 回复和非专业人士摘要的可读性均使用经过验证的可读性工具进行了评估。我们在泌尿科医生（泌尿科医生和正在接受培训的泌尿科医生）中进行了一项调查，使用 5 点李克特量表对原始 ChatGPT 回复的准确性、完整性和清晰度进行评分。此外，两名独立审查员还对非专业人员摘要的正确性三要素进行了评估：准确性、完整性和决策充分性。公众对简化摘要的清晰度和可理解性的评估是通过亚马逊机械土耳其人（MTurk）进行的。参与者对清晰度进行评分，并通过多选题证明自己的理解能力：在 9 个场景中，71.7% 至 94.3% 的评分者（36 名泌尿科医生、17 名泌尿科住院医师）认为 GPT 生成的输出正确。GPT 生成的简化非专业人士摘要在 9 个场景中的 8 个（88.9%）被评为准确，在 9 个场景中的 8 个（88.9%）被评为足以让患者做出决定。非专业人士摘要的平均可读性高于原始 GPT 输出（[原始 ChatGPT v. 简化 ChatGPT，平均值（标清），p 值] Flesch 阅读容易度：36.5(9.1) v. 70.2(11.2)，结论：GPT 显示了针对前列腺癌相关内容对患者进行正确教育的前景，但该技术并非为向患者提供信息而设计。在使用 GPT 驱动的医疗聊天机器人时，促使模型做出准确、完整、清晰和可读的回应可能会提高其实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Prostate Cancer and Prostatic Diseases 医学-泌尿学与肾脏学

CiteScore

10.00

自引率

6.20%

发文量

142

审稿时长

6-12 weeks

期刊介绍： Prostate Cancer and Prostatic Diseases covers all aspects of prostatic diseases, in particular prostate cancer, the subject of intensive basic and clinical research world-wide. The journal also reports on exciting new developments being made in diagnosis, surgery, radiotherapy, drug discovery and medical management. Prostate Cancer and Prostatic Diseases is of interest to surgeons, oncologists and clinicians treating patients and to those involved in research into diseases of the prostate. The journal covers the three main areas - prostate cancer, male LUTS and prostatitis. Prostate Cancer and Prostatic Diseases publishes original research articles, reviews, topical comment and critical appraisals of scientific meetings and the latest books. The journal also contains a calendar of forthcoming scientific meetings. The Editors and a distinguished Editorial Board ensure that submitted articles receive fast and efficient attention and are refereed to the highest possible scientific standard. A fast track system is available for topical articles of particular significance.