Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe
{"title":"基于协会指南的ChatGPT在回答儿科泌尿科问题上的表现评估。","authors":"Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe","doi":"10.5489/cuaj.9238","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.</p><p><strong>Methods: </strong>A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.</p><p><strong>Results: </strong>ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). \"Easy\" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to \"medium\" (1.92/3.00±0.07, p=0.003) and \"difficult\" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).</p><p><strong>Conclusions: </strong>ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.</p>","PeriodicalId":50613,"journal":{"name":"Cuaj-Canadian Urological Association Journal","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of ChatGPT's performance on answering pediatric urology questions based on association guidelines.\",\"authors\":\"Wyatt MacNevin, Nicholas Dawe, Laura Harkness, Budoor Salman, Daniel T Keefe\",\"doi\":\"10.5489/cuaj.9238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.</p><p><strong>Methods: </strong>A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.</p><p><strong>Results: </strong>ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). \\\"Easy\\\" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to \\\"medium\\\" (1.92/3.00±0.07, p=0.003) and \\\"difficult\\\" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).</p><p><strong>Conclusions: </strong>ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.</p>\",\"PeriodicalId\":50613,\"journal\":{\"name\":\"Cuaj-Canadian Urological Association Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cuaj-Canadian Urological Association Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5489/cuaj.9238\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cuaj-Canadian Urological Association Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5489/cuaj.9238","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
Evaluation of ChatGPT's performance on answering pediatric urology questions based on association guidelines.
Introduction: ChatGPT has been shown to provide accurate and complete responses to clinically focused questions, although its ability to successfully answer common pediatric urology-based questions remains unexplored. Furthermore, the concordance of ChatGPT's answers with association recommendations has yet to be analyzed.
Methods: A list of common pediatric urology questions of varying difficulty was developed in association with publicly available guidelines and resources from the Canadian Urological Association (CUA), American Urological Association (AUA), and the European Association of Urology (EAU). Questions were administered individually using three separate functions, and responses were evaluated for comprehensiveness and accuracy using a Likert scale. Descriptive statistics and analysis of variance was used for statistical analysis.
Results: ChatGPT performed best in the domain of phimosis (mean ± standard deviation: 2.32/3.00±0.57) and VUR (2.11/3.00±0.63) and worst in acute scrotal pathology (1.90/3.00±0.58) and cryptorchidism (1.92/3.00±0.56) (p=0.031). "Easy" questions (2.31/3.00±0.09) had greater comprehensiveness scores compared to "medium" (1.92/3.00±0.07, p=0.003) and "difficult" questions (1.86/3.00±0.101, p=0.003). Definition-based questions had greater comprehensiveness scores across all guidelines. ChatGPT was more accurate and in concordance with EAU-based information (2.10±0.41) compared to AUA (1.95±0.41, p=0.04).
Conclusions: ChatGPT answered questions with high levels of appropriateness and comprehensiveness. ChatGPT performed best in the areas of phimosis and VUR and worst in acute scrotal pathology. While ChatGPT performed well across all question domains, it performed best when referenced to EAU and CUA compared to AUA.
期刊介绍:
CUAJ is a a peer-reviewed, open-access journal devoted to promoting the highest standard of urological patient care through the publication of timely, relevant, evidence-based research and advocacy information.