基于协会指南的ChatGPT在回答儿科泌尿科问题上的表现评估。

IF 2 4区 医学 Q3 UROLOGY & NEPHROLOGY
Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe
{"title":"基于协会指南的ChatGPT在回答儿科泌尿科问题上的表现评估。","authors":"Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe","doi":"10.5489/cuaj.9238","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.</p><p><strong>Methods: </strong>A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.</p><p><strong>Results: </strong>ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). \"Easy\" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to \"medium\" (1.92/3.00±0.07, p=0.003) and \"difficult\" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).</p><p><strong>Conclusions: </strong>ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.</p>","PeriodicalId":50613,"journal":{"name":"Cuaj-Canadian Urological Association Journal","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of ChatGPT's performance on answering pediatric urology questions based on association guidelines.\",\"authors\":\"Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe\",\"doi\":\"10.5489/cuaj.9238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.</p><p><strong>Methods: </strong>A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.</p><p><strong>Results: </strong>ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). \\\"Easy\\\" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to \\\"medium\\\" (1.92/3.00±0.07, p=0.003) and \\\"difficult\\\" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).</p><p><strong>Conclusions: </strong>ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.</p>\",\"PeriodicalId\":50613,\"journal\":{\"name\":\"Cuaj-Canadian Urological Association Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cuaj-Canadian Urological Association Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5489/cuaj.9238\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cuaj-Canadian Urological Association Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5489/cuaj.9238","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

ChatGPT已被证明能够对临床问题提供准确和完整的回答,尽管其成功回答儿科泌尿科常见问题的能力仍未得到探索。此外,ChatGPT的答案与协会建议的一致性还有待分析。方法:根据加拿大泌尿外科协会(CUA)、美国泌尿外科协会(AUA)和欧洲泌尿外科协会(EAU)的公开指南和资源,制定了一份不同难度的常见儿科泌尿外科问题清单。使用三个单独的功能单独管理问题,并使用李克特量表评估回答的全面性和准确性。采用描述性统计和方差分析进行统计分析。结果:ChatGPT在包茎部(平均±标准差:2.32/3.00±0.57)和VUR(2.11/3.00±0.63)表现最佳,在急性阴囊病理(1.90/3.00±0.58)和隐睾(1.92/3.00±0.56)表现最差(p=0.031)。“易”题(2.31/3.00±0.09)的综合得分高于“中”题(1.92/3.00±0.07,p=0.003)和“难”题(1.86/3.00±0.101,p=0.003)。基于定义的问题在所有指南中都有更高的综合得分。ChatGPT的准确率(2.10±0.41)高于AUA(1.95±0.41,p=0.04)。结论:ChatGPT回答的问题具有较高的适当性和全面性。ChatGPT在包茎和VUR区域表现最好,在急性阴囊病理中表现最差。虽然ChatGPT在所有问题域中都表现良好,但与AUA相比,它在引用EAU和CUA时表现最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of ChatGPT's performance on answering pediatric urology questions based on association guidelines.

Introduction: ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.

Methods: A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.

Results: ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). "Easy" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to "medium" (1.92/3.00±0.07, p=0.003) and "difficult" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).

Conclusions: ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cuaj-Canadian Urological Association Journal
Cuaj-Canadian Urological Association Journal 医学-泌尿学与肾脏学
CiteScore
2.80
自引率
10.50%
发文量
167
审稿时长
>12 weeks
期刊介绍: CUAJ is a a peer-reviewed, open-access journal devoted to promoting the highest standard of urological patient care through the publication of timely, relevant, evidence-based research and advocacy information.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信