Battle of the bots: a comparative analysis of ChatGPT and bing AI for kidney stone-related questions.

IF 2.8 2区医学 Q2 UROLOGY & NEPHROLOGY

World Journal of Urology Pub Date : 2024-10-29 DOI:10.1007/s00345-024-05326-1

Amber K McMahon, Russell S Terry, Willian E Ito, Wilson R Molina, Bristol B Whiles

{"title":"Battle of the bots: a comparative analysis of ChatGPT and bing AI for kidney stone-related questions.","authors":"Amber K McMahon, Russell S Terry, Willian E Ito, Wilson R Molina, Bristol B Whiles","doi":"10.1007/s00345-024-05326-1","DOIUrl":null,"url":null,"abstract":"Objectives: To evaluate and compare the performance of ChatGPT™ (Open AI®) and Bing AI™ (Microsoft®) for responding to kidney stone treatment-related questions in accordance with the American Urological Association (AUA) guidelines and assess factors such as appropriateness, emphasis on consulting healthcare providers, references, and adherence to guidelines by each chatbot.Methods: We developed 20 kidney stone evaluation and treatment-related questions based on the AUA Surgical Management of Stones guideline. Questions were asked to ChatGPT and Bing AI chatbots. We compared their responses utilizing the brief DISCERN tool as well as response appropriateness.Results: ChatGPT significantly outperformed Bing AI for questions 1-3, which evaluate clarity, achievement, and relevance of responses (12.77 ± 1.71 vs. 10.17 ± 3.27; p < 0.01). In contrast, Bing AI always incorporated references, whereas ChatGPT never did. Consequently, the results for questions 4-6, which evaluated the quality of sources, were consistently favored Bing AI over ChatGPT (10.8 vs. 4.28; p < 0.01). Notably, neither chatbot offered guidance against guidelines for pre-operative testing. However, recommendations against guidelines were notable for specific scenarios: 30.5% for the treatment of adults with ureteral stones, 52.5% for adults with renal stones, and 20.5% for all patient treatment.Conclusions: ChatGPT significantly outperformed Bing AI in terms of providing responses with clear aim, achieving such aim, and relevant and appropriate responses based on AUA surgical stone management guidelines. However, Bing AI provides references, allowing information quality assessment. Additional studies are needed to further evaluate these chatbots and their potential use by clinicians and patients for urologic healthcare-related questions.","PeriodicalId":23954,"journal":{"name":"World Journal of Urology","volume":"42 1","pages":"600"},"PeriodicalIF":2.8000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Urology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00345-024-05326-1","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: To evaluate and compare the performance of ChatGPT™ (Open AI^®) and Bing AI™ (Microsoft^®) for responding to kidney stone treatment-related questions in accordance with the American Urological Association (AUA) guidelines and assess factors such as appropriateness, emphasis on consulting healthcare providers, references, and adherence to guidelines by each chatbot.

Methods: We developed 20 kidney stone evaluation and treatment-related questions based on the AUA Surgical Management of Stones guideline. Questions were asked to ChatGPT and Bing AI chatbots. We compared their responses utilizing the brief DISCERN tool as well as response appropriateness.

Results: ChatGPT significantly outperformed Bing AI for questions 1-3, which evaluate clarity, achievement, and relevance of responses (12.77 ± 1.71 vs. 10.17 ± 3.27; p < 0.01). In contrast, Bing AI always incorporated references, whereas ChatGPT never did. Consequently, the results for questions 4-6, which evaluated the quality of sources, were consistently favored Bing AI over ChatGPT (10.8 vs. 4.28; p < 0.01). Notably, neither chatbot offered guidance against guidelines for pre-operative testing. However, recommendations against guidelines were notable for specific scenarios: 30.5% for the treatment of adults with ureteral stones, 52.5% for adults with renal stones, and 20.5% for all patient treatment.

Conclusions: ChatGPT significantly outperformed Bing AI in terms of providing responses with clear aim, achieving such aim, and relevant and appropriate responses based on AUA surgical stone management guidelines. However, Bing AI provides references, allowing information quality assessment. Additional studies are needed to further evaluate these chatbots and their potential use by clinicians and patients for urologic healthcare-related questions.

查看原文本刊更多论文

机器人之战：ChatGPT 和 bing AI 对肾结石相关问题的比较分析。

目的评估并比较 ChatGPT™（Open AI®）和 Bing AI™（Microsoft®）在根据美国泌尿外科协会（AUA）指南回答肾结石治疗相关问题时的表现，并评估每个聊天机器人的适当性、对咨询医疗保健提供者的重视程度、参考文献以及对指南的遵守情况等因素：方法：我们根据 AUA 结石外科治疗指南开发了 20 个与肾结石评估和治疗相关的问题。我们向 ChatGPT 和 Bing AI 聊天机器人提出了问题。我们利用简短的 DISCERN 工具比较了它们的回答以及回答的适当性：结果：在评估回答的清晰度、成就感和相关性的 1-3 个问题上，ChatGPT 的表现明显优于 Bing AI（12.77 ± 1.71 vs. 10.17 ± 3.27；P 结论：ChatGPT 的表现明显优于 Bing AI：根据 AUA 手术结石管理指南，ChatGPT 在提供目标明确的回答、实现此类目标以及相关且适当的回答方面明显优于 Bing AI。不过，Bing AI 提供了参考文献，可以对信息质量进行评估。还需要进行更多的研究，以进一步评估这些聊天机器人及其在临床医生和患者遇到泌尿科医疗保健相关问题时的潜在用途。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

World Journal of Urology 医学-泌尿学与肾脏学

CiteScore

6.80

自引率

8.80%

发文量

317

审稿时长

4-8 weeks

期刊介绍： The WORLD JOURNAL OF UROLOGY conveys regularly the essential results of urological research and their practical and clinical relevance to a broad audience of urologists in research and clinical practice. In order to guarantee a balanced program, articles are published to reflect the developments in all fields of urology on an internationally advanced level. Each issue treats a main topic in review articles of invited international experts. Free papers are unrelated articles to the main topic.