Comperative analysis of three chatbot responses on pediatric primary nocturnal enuresis.

IF 2 3区医学 Q2 PEDIATRICS

Journal of Pediatric Urology Pub Date : 2025-04-30 DOI:10.1016/j.jpurol.2025.04.031

Asya Eylem Boztas, Esra Ensari

{"title":"Comperative analysis of three chatbot responses on pediatric primary nocturnal enuresis.","authors":"Asya Eylem Boztas, Esra Ensari","doi":"10.1016/j.jpurol.2025.04.031","DOIUrl":null,"url":null,"abstract":"Background: The purpose of the study was to evaluate both the accuracy and reproducibility of the answers given by ChatGPT-4o®, Gemini® and Copilot® to frequently asked questions about pediatric primary enuresis nocturna.Methods: Forty frequently asked questions about primary nocturnal enuresis were asked 2 times, one week apart, on ChatGPT-4o, Gemini and Copilot. One of each pediatric surgeon and nephrologist independently scored the answers into 4 groups: comprehensive/correct (1), incomplete/partially correct (2), a mix of accurate and inaccurate/misleading (3), and completely inaccurate/irrelevant (4). The accuracy and reproducibility of each chatbots answers were evaluated.Results: In comparison of these most common used chatbots, the order of completely correct response rates from highest to lowest was Chat GPT-4o and followed by Copilot and Gemini. With an accuracy percentage of 92.5 %, ChatGPT-4o gave the most accurate responses of any AI chatbot. Gemini answered 50 % of questions correctly. Copilot was the weakest successful chatbot in answering questions about enuresis nocturna with 45 % of completely accurate answer ratio. Besides Copilot has a ratio of 2.5 % for completely inaccurate/irrelevant response. Reproducibility of ChatGPT-4o, Gemini and Copilots were 85 %, 77.5 %, 70 % respectively.Conclusion: ChatGPT-4o is more successful in providing a high percentage of accurate responses regarding nocturnal enuresis. Both patients and their parents can use it, especially for simple, low-complexity medical questions. However, it should be used alongside expert healthcare proffesional.","PeriodicalId":16747,"journal":{"name":"Journal of Pediatric Urology","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Urology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jpurol.2025.04.031","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The purpose of the study was to evaluate both the accuracy and reproducibility of the answers given by ChatGPT-4o®, Gemini® and Copilot® to frequently asked questions about pediatric primary enuresis nocturna.

Methods: Forty frequently asked questions about primary nocturnal enuresis were asked 2 times, one week apart, on ChatGPT-4o, Gemini and Copilot. One of each pediatric surgeon and nephrologist independently scored the answers into 4 groups: comprehensive/correct (1), incomplete/partially correct (2), a mix of accurate and inaccurate/misleading (3), and completely inaccurate/irrelevant (4). The accuracy and reproducibility of each chatbots answers were evaluated.

Results: In comparison of these most common used chatbots, the order of completely correct response rates from highest to lowest was Chat GPT-4o and followed by Copilot and Gemini. With an accuracy percentage of 92.5 %, ChatGPT-4o gave the most accurate responses of any AI chatbot. Gemini answered 50 % of questions correctly. Copilot was the weakest successful chatbot in answering questions about enuresis nocturna with 45 % of completely accurate answer ratio. Besides Copilot has a ratio of 2.5 % for completely inaccurate/irrelevant response. Reproducibility of ChatGPT-4o, Gemini and Copilots were 85 %, 77.5 %, 70 % respectively.

Conclusion: ChatGPT-4o is more successful in providing a high percentage of accurate responses regarding nocturnal enuresis. Both patients and their parents can use it, especially for simple, low-complexity medical questions. However, it should be used alongside expert healthcare proffesional.

查看原文本刊更多论文

三种聊天机器人对小儿原发性夜间遗尿反应的比较分析。

背景：本研究的目的是评估chatgpt - 40®、Gemini®和Copilot®对儿童原发性夜尿症常见问题的回答的准确性和可重复性。方法：对40例原发性夜间遗尿常见问题进行2次问卷调查，间隔1周，分别在chatgpt - 40、Gemini和Copilot上进行问卷调查。每名儿科外科医生和肾病专家分别有一名独立地将答案分为四组：全面/正确(1)，不完整/部分正确(2)，准确和不准确/误导混合(3)，完全不准确/不相关(4)。评估了每个聊天机器人答案的准确性和可重复性。结果：在这些最常用的聊天机器人中，完全正确的回答率从高到低的顺序是Chat gpt - 40，其次是Copilot和Gemini。chatgpt - 40的准确率为92.5%，是所有人工智能聊天机器人中最准确的。双子座能正确回答50%的问题。在回答夜尿症相关问题时，副驾驶是最不成功的聊天机器人，完全准确的回答率为45%。此外，副驾驶对完全不准确/不相关的反应的比率为2.5%。chatgpt - 40、Gemini和Copilots的重复性分别为85%、77.5%和70%。结论：chatgpt - 40在提供夜间遗尿的高比例准确反应方面更为成功。病人和他们的父母都可以使用它，特别是对于简单、低复杂性的医疗问题。然而，它应该与专家医疗保健专业人员一起使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Pediatric Urology PEDIATRICS-UROLOGY & NEPHROLOGY

CiteScore

3.70

自引率

15.00%

发文量

330

审稿时长

4-8 weeks

期刊介绍： The Journal of Pediatric Urology publishes submitted research and clinical articles relating to Pediatric Urology which have been accepted after adequate peer review. It publishes regular articles that have been submitted after invitation, that cover the curriculum of Pediatric Urology, and enable trainee surgeons to attain theoretical competence of the sub-specialty. It publishes regular reviews of pediatric urological articles appearing in other journals. It publishes invited review articles by recognised experts on modern or controversial aspects of the sub-specialty. It enables any affiliated society to advertise society events or information in the journal without charge and will publish abstracts of papers to be read at society meetings.