基于协会指南的ChatGPT在回答儿科泌尿科问题上的表现评估。

IF 2 4区医学 Q3 UROLOGY & NEPHROLOGY

Cuaj-Canadian Urological Association Journal Pub Date : 2025-07-28 DOI:10.5489/cuaj.9238

Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe

{"title":"基于协会指南的ChatGPT在回答儿科泌尿科问题上的表现评估。","authors":"Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe","doi":"10.5489/cuaj.9238","DOIUrl":null,"url":null,"abstract":"Introduction: ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.Methods: A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.Results: ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). \"Easy\" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to \"medium\" (1.92/3.00±0.07, p=0.003) and \"difficult\" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).Conclusions: ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.","PeriodicalId":50613,"journal":{"name":"Cuaj-Canadian Urological Association Journal","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of ChatGPT's performance on answering pediatric urology questions based on association guidelines.\",\"authors\":\"Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe\",\"doi\":\"10.5489/cuaj.9238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.Methods: A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.Results: ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). \\\"Easy\\\" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to \\\"medium\\\" (1.92/3.00±0.07, p=0.003) and \\\"difficult\\\" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).Conclusions: ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.\",\"PeriodicalId\":50613,\"journal\":{\"name\":\"Cuaj-Canadian Urological Association Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cuaj-Canadian Urological Association Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5489/cuaj.9238\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cuaj-Canadian Urological Association Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5489/cuaj.9238","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}

引用次数: 0

摘要

ChatGPT已被证明能够对临床问题提供准确和完整的回答，尽管其成功回答儿科泌尿科常见问题的能力仍未得到探索。此外，ChatGPT的答案与协会建议的一致性还有待分析。方法：根据加拿大泌尿外科协会（CUA）、美国泌尿外科协会（AUA）和欧洲泌尿外科协会（EAU）的公开指南和资源，制定了一份不同难度的常见儿科泌尿外科问题清单。使用三个单独的功能单独管理问题，并使用李克特量表评估回答的全面性和准确性。采用描述性统计和方差分析进行统计分析。结果：ChatGPT在包茎部（平均±标准差：2.32/3.00±0.57）和VUR（2.11/3.00±0.63）表现最佳，在急性阴囊病理（1.90/3.00±0.58）和隐睾（1.92/3.00±0.56）表现最差（p=0.031）。“易”题（2.31/3.00±0.09）的综合得分高于“中”题（1.92/3.00±0.07,p=0.003）和“难”题（1.86/3.00±0.101,p=0.003）。基于定义的问题在所有指南中都有更高的综合得分。ChatGPT的准确率（2.10±0.41）高于AUA（1.95±0.41,p=0.04）。结论：ChatGPT回答的问题具有较高的适当性和全面性。ChatGPT在包茎和VUR区域表现最好，在急性阴囊病理中表现最差。虽然ChatGPT在所有问题域中都表现良好，但与AUA相比，它在引用EAU和CUA时表现最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of ChatGPT's performance on answering pediatric urology questions based on association guidelines.

Introduction: ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.

Methods: A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.

Results: ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). "Easy" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to "medium" (1.92/3.00±0.07, p=0.003) and "difficult" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).

Conclusions: ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cuaj-Canadian Urological Association Journal 医学-泌尿学与肾脏学

CiteScore

2.80

自引率

10.50%

发文量

167

审稿时长

>12 weeks

期刊介绍： CUAJ is a a peer-reviewed, open-access journal devoted to promoting the highest standard of urological patient care through the publication of timely, relevant, evidence-based research and advocacy information.