Performance of large language models on benign prostatic hyperplasia frequently asked questions.

IF 2.6 3区 医学 Q3 ENDOCRINOLOGY & METABOLISM
Prostate Pub Date : 2024-06-01 Epub Date: 2024-04-01 DOI:10.1002/pros.24699
YuNing Zhang, Yijie Dong, Zihan Mei, Yiqing Hou, Minyan Wei, Yat Hin Yeung, Jiale Xu, Qing Hua, LiMei Lai, Ning Li, ShuJun Xia, Chun Zhou, JianQiao Zhou
{"title":"Performance of large language models on benign prostatic hyperplasia frequently asked questions.","authors":"YuNing Zhang, Yijie Dong, Zihan Mei, Yiqing Hou, Minyan Wei, Yat Hin Yeung, Jiale Xu, Qing Hua, LiMei Lai, Ning Li, ShuJun Xia, Chun Zhou, JianQiao Zhou","doi":"10.1002/pros.24699","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Benign prostatic hyperplasia (BPH) is a common condition, yet it is challenging for the average BPH patient to find credible and accurate information about BPH. Our goal is to evaluate and compare the accuracy and reproducibility of large language models (LLMs), including ChatGPT-3.5, ChatGPT-4, and the New Bing Chat in responding to a BPH frequently asked questions (FAQs) questionnaire.</p><p><strong>Methods: </strong>A total of 45 questions related to BPH were categorized into basic and professional knowledge. Three LLM-ChatGPT-3.5, ChatGPT-4, and New Bing Chat-were utilized to generate responses to these questions. Responses were graded as comprehensive, correct but inadequate, mixed with incorrect/outdated data, or completely incorrect. Reproducibility was assessed by generating two responses for each question. All responses were reviewed and judged by experienced urologists.</p><p><strong>Results: </strong>All three LLMs exhibited high accuracy in generating responses to questions, with accuracy rates ranging from 86.7% to 100%. However, there was no statistically significant difference in response accuracy among the three (p > 0.017 for all comparisons). Additionally, the accuracy of the LLMs' responses to the basic knowledge questions was roughly equivalent to that of the specialized knowledge questions, showing a difference of less than 3.5% (GPT-3.5: 90% vs. 86.7%; GPT-4: 96.7% vs. 95.6%; New Bing: 96.7% vs. 93.3%). Furthermore, all three LLMs demonstrated high reproducibility, with rates ranging from 93.3% to 97.8%.</p><p><strong>Conclusions: </strong>ChatGPT-3.5, ChatGPT-4, and New Bing Chat offer accurate and reproducible responses to BPH-related questions, establishing them as valuable resources for enhancing health literacy and supporting BPH patients in conjunction with healthcare professionals.</p>","PeriodicalId":54544,"journal":{"name":"Prostate","volume":" ","pages":"807-813"},"PeriodicalIF":2.6000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prostate","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pros.24699","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/1 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Benign prostatic hyperplasia (BPH) is a common condition, yet it is challenging for the average BPH patient to find credible and accurate information about BPH. Our goal is to evaluate and compare the accuracy and reproducibility of large language models (LLMs), including ChatGPT-3.5, ChatGPT-4, and the New Bing Chat in responding to a BPH frequently asked questions (FAQs) questionnaire.

Methods: A total of 45 questions related to BPH were categorized into basic and professional knowledge. Three LLM-ChatGPT-3.5, ChatGPT-4, and New Bing Chat-were utilized to generate responses to these questions. Responses were graded as comprehensive, correct but inadequate, mixed with incorrect/outdated data, or completely incorrect. Reproducibility was assessed by generating two responses for each question. All responses were reviewed and judged by experienced urologists.

Results: All three LLMs exhibited high accuracy in generating responses to questions, with accuracy rates ranging from 86.7% to 100%. However, there was no statistically significant difference in response accuracy among the three (p > 0.017 for all comparisons). Additionally, the accuracy of the LLMs' responses to the basic knowledge questions was roughly equivalent to that of the specialized knowledge questions, showing a difference of less than 3.5% (GPT-3.5: 90% vs. 86.7%; GPT-4: 96.7% vs. 95.6%; New Bing: 96.7% vs. 93.3%). Furthermore, all three LLMs demonstrated high reproducibility, with rates ranging from 93.3% to 97.8%.

Conclusions: ChatGPT-3.5, ChatGPT-4, and New Bing Chat offer accurate and reproducible responses to BPH-related questions, establishing them as valuable resources for enhancing health literacy and supporting BPH patients in conjunction with healthcare professionals.

大型语言模型在良性前列腺增生常见问题上的表现。
背景:良性前列腺增生症(BPH)是一种常见病,但对于普通的良性前列腺增生症患者来说,要找到可信、准确的良性前列腺增生症相关信息却很困难。我们的目标是评估和比较大型语言模型(LLM),包括 ChatGPT-3.5、ChatGPT-4 和 New Bing Chat 在回答良性前列腺增生症常见问题(FAQs)问卷时的准确性和可重复性:共有 45 个与良性前列腺增生相关的问题,分为基础知识和专业知识两类。利用三种 LLM--ChatGPT-3.5、ChatGPT-4 和 New Bing Chat 来生成对这些问题的回答。回答分为全面、正确但不充分、与不正确/过时数据混合或完全不正确。通过为每个问题生成两个回答来评估可重复性。所有回答均由经验丰富的泌尿科医生进行审核和评判:结果:所有三个 LLM 在生成对问题的回答时都表现出很高的准确性,准确率从 86.7% 到 100% 不等。然而,三者的回答准确率在统计学上并无显著差异(所有比较的 p > 0.017)。此外,法律硕士回答基础知识问题的准确率与回答专业知识问题的准确率基本相当,相差不到 3.5%(GPT-3.5:90% 对 86.7%;GPT-4:96.7% 对 95.6%;New Bing:96.7% 对 93.3%)。此外,所有三种 LLM 都表现出很高的重现性,重现率从 93.3% 到 97.8%:ChatGPT-3.5、ChatGPT-4 和 New Bing Chat 对良性前列腺增生相关问题提供了准确且可重复的回答,使它们成为提高健康素养和支持良性前列腺增生患者与医护人员合作的宝贵资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Prostate
Prostate 医学-泌尿学与肾脏学
CiteScore
5.10
自引率
3.60%
发文量
180
审稿时长
1.5 months
期刊介绍: The Prostate is a peer-reviewed journal dedicated to original studies of this organ and the male accessory glands. It serves as an international medium for these studies, presenting comprehensive coverage of clinical, anatomic, embryologic, physiologic, endocrinologic, and biochemical studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信