ChatGPT 能否为患者提供高质量的男性下尿路症状信息，提示良性前列腺增生？

IF 5.1 2区医学 Q1 ONCOLOGY

Prostate Cancer and Prostatic Diseases Pub Date : 2024-06-13 DOI:10.1038/s41391-024-00847-7

Angie K Puerto Nino, Valentina Garcia Perez, Silvia Secco, Cosimo De Nunzio, Riccardo Lombardo, Kari A O Tikkinen, Dean S Elterman

{"title":"ChatGPT 能否为患者提供高质量的男性下尿路症状信息，提示良性前列腺增生？","authors":"Angie K Puerto Nino, Valentina Garcia Perez, Silvia Secco, Cosimo De Nunzio, Riccardo Lombardo, Kari A O Tikkinen, Dean S Elterman","doi":"10.1038/s41391-024-00847-7","DOIUrl":null,"url":null,"abstract":"Background: ChatGPT has recently emerged as a novel resource for patients' disease-specific inquiries. There is, however, limited evidence assessing the quality of the information. We evaluated the accuracy and quality of the ChatGPT's responses on male lower urinary tract symptoms (LUTS) suggestive of benign prostate enlargement (BPE) when compared to two reference resources.Methods: Using patient information websites from the European Association of Urology and the American Urological Association as reference material, we formulated 88 BPE-centric questions for ChatGPT 4.0+. Independently and in duplicate, we compared the ChatGPT's responses and the reference material, calculating accuracy through F1 score, precision, and recall metrics. We used a 5-point Likert scale for quality rating. We evaluated examiner agreement using the interclass correlation coefficient and assessed the difference in the quality scores with the Wilcoxon signed-rank test.Results: ChatGPT addressed all (88/88) LUTS/BPE-related questions. For the 88 questions, the recorded F1 score was 0.79 (range: 0-1), precision 0.66 (range: 0-1), recall 0.97 (range: 0-1), and the quality score had a median of 4 (range = 1-5). Examiners had a good level of agreement (ICC = 0.86). We found no statistically significant difference between the scores given by the examiners and the overall quality of the responses (p = 0.72).Discussion: ChatGPT demostrated a potential utility in educating patients about BPE/LUTS, its prognosis, and treatment that helps in the decision-making process. One must exercise prudence when recommending this as the sole information outlet. Additional studies are needed to completely understand the full extent of AI's efficacy in delivering patient education in urology.","PeriodicalId":20727,"journal":{"name":"Prostate Cancer and Prostatic Diseases","volume":" ","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement?\",\"authors\":\"Angie K Puerto Nino, Valentina Garcia Perez, Silvia Secco, Cosimo De Nunzio, Riccardo Lombardo, Kari A O Tikkinen, Dean S Elterman\",\"doi\":\"10.1038/s41391-024-00847-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: ChatGPT has recently emerged as a novel resource for patients' disease-specific inquiries. There is, however, limited evidence assessing the quality of the information. We evaluated the accuracy and quality of the ChatGPT's responses on male lower urinary tract symptoms (LUTS) suggestive of benign prostate enlargement (BPE) when compared to two reference resources.Methods: Using patient information websites from the European Association of Urology and the American Urological Association as reference material, we formulated 88 BPE-centric questions for ChatGPT 4.0+. Independently and in duplicate, we compared the ChatGPT's responses and the reference material, calculating accuracy through F1 score, precision, and recall metrics. We used a 5-point Likert scale for quality rating. We evaluated examiner agreement using the interclass correlation coefficient and assessed the difference in the quality scores with the Wilcoxon signed-rank test.Results: ChatGPT addressed all (88/88) LUTS/BPE-related questions. For the 88 questions, the recorded F1 score was 0.79 (range: 0-1), precision 0.66 (range: 0-1), recall 0.97 (range: 0-1), and the quality score had a median of 4 (range = 1-5). Examiners had a good level of agreement (ICC = 0.86). We found no statistically significant difference between the scores given by the examiners and the overall quality of the responses (p = 0.72).Discussion: ChatGPT demostrated a potential utility in educating patients about BPE/LUTS, its prognosis, and treatment that helps in the decision-making process. One must exercise prudence when recommending this as the sole information outlet. Additional studies are needed to completely understand the full extent of AI's efficacy in delivering patient education in urology.\",\"PeriodicalId\":20727,\"journal\":{\"name\":\"Prostate Cancer and Prostatic Diseases\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Prostate Cancer and Prostatic Diseases\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1038/s41391-024-00847-7\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prostate Cancer and Prostatic Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41391-024-00847-7","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：最近，ChatGPT 作为一种新型资源出现，可为患者提供特定疾病的咨询。然而，评估信息质量的证据却很有限。我们评估了 ChatGPT 对男性下尿路症状（LUTS）提示良性前列腺增生（BPE）的回答的准确性和质量，并与两个参考资源进行了比较：以欧洲泌尿学协会和美国泌尿学协会的患者信息网站为参考资料，我们为 ChatGPT 4.0+ 制定了 88 个以 BPE 为中心的问题。我们对 ChatGPT 的回答和参考资料进行了独立和重复的比较，通过 F1 分数、精确度和召回指标计算精确度。我们使用 5 点李克特量表进行质量评级。我们使用类间相关系数评估考官的一致性，并使用 Wilcoxon 符号秩检验评估质量评分的差异：ChatGPT 解决了所有（88/88）与 LUTS/BPE 相关的问题。在 88 个问题中，记录的 F1 得分为 0.79（范围：0-1），精确度为 0.66（范围：0-1），召回率为 0.97（范围：0-1），质量得分的中位数为 4（范围 = 1-5）。考官们的意见高度一致（ICC = 0.86）。我们发现考官给出的分数与回答的总体质量之间没有统计学意义上的差异（p = 0.72）：讨论：ChatGPT 展示了在向患者介绍 BPE/LUTS、其预后和治疗方面的潜在作用，有助于决策过程。在建议将其作为唯一的信息渠道时，必须慎之又慎。要全面了解人工智能在泌尿科患者教育方面的功效，还需要进行更多的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement?

查看原文本刊更多论文

Can ChatGPT provide high-quality patient information on male lower urinary tract symptoms suggestive of benign prostate enlargement?

Background: ChatGPT has recently emerged as a novel resource for patients' disease-specific inquiries. There is, however, limited evidence assessing the quality of the information. We evaluated the accuracy and quality of the ChatGPT's responses on male lower urinary tract symptoms (LUTS) suggestive of benign prostate enlargement (BPE) when compared to two reference resources.

Methods: Using patient information websites from the European Association of Urology and the American Urological Association as reference material, we formulated 88 BPE-centric questions for ChatGPT 4.0+. Independently and in duplicate, we compared the ChatGPT's responses and the reference material, calculating accuracy through F1 score, precision, and recall metrics. We used a 5-point Likert scale for quality rating. We evaluated examiner agreement using the interclass correlation coefficient and assessed the difference in the quality scores with the Wilcoxon signed-rank test.

Results: ChatGPT addressed all (88/88) LUTS/BPE-related questions. For the 88 questions, the recorded F1 score was 0.79 (range: 0-1), precision 0.66 (range: 0-1), recall 0.97 (range: 0-1), and the quality score had a median of 4 (range = 1-5). Examiners had a good level of agreement (ICC = 0.86). We found no statistically significant difference between the scores given by the examiners and the overall quality of the responses (p = 0.72).

Discussion: ChatGPT demostrated a potential utility in educating patients about BPE/LUTS, its prognosis, and treatment that helps in the decision-making process. One must exercise prudence when recommending this as the sole information outlet. Additional studies are needed to completely understand the full extent of AI's efficacy in delivering patient education in urology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Prostate Cancer and Prostatic Diseases 医学-泌尿学与肾脏学

CiteScore

10.00

自引率

6.20%

发文量

142

审稿时长

6-12 weeks

期刊介绍： Prostate Cancer and Prostatic Diseases covers all aspects of prostatic diseases, in particular prostate cancer, the subject of intensive basic and clinical research world-wide. The journal also reports on exciting new developments being made in diagnosis, surgery, radiotherapy, drug discovery and medical management. Prostate Cancer and Prostatic Diseases is of interest to surgeons, oncologists and clinicians treating patients and to those involved in research into diseases of the prostate. The journal covers the three main areas - prostate cancer, male LUTS and prostatitis. Prostate Cancer and Prostatic Diseases publishes original research articles, reviews, topical comment and critical appraisals of scientific meetings and the latest books. The journal also contains a calendar of forthcoming scientific meetings. The Editors and a distinguished Editorial Board ensure that submitted articles receive fast and efficient attention and are refereed to the highest possible scientific standard. A fast track system is available for topical articles of particular significance.