Daniel R Hanna, Michael L Creswell, Russell S Terry, Lucas B Vergamini, Mihaela Sardiu, Holly E Du, Amber K McMahon, Wilson R Molina, Bristol B Whiles
{"title":"Bing chat for kidney stone management questions based on the AUA guidelines: a comparison of chatbot conversation style modes.","authors":"Daniel R Hanna, Michael L Creswell, Russell S Terry, Lucas B Vergamini, Mihaela Sardiu, Holly E Du, Amber K McMahon, Wilson R Molina, Bristol B Whiles","doi":"10.1007/s00345-025-05533-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Artificial intelligence (AI) technology will inevitably permeate healthcare. Bing Chat is an AI chatbot with different conservation styles. We evaluated each of these response mode answers regarding management of nephrolithiasis.</p><p><strong>Methods: </strong>A total of 20 questions were created based on the AUA Surgical Management of Stones guidelines. Bing Chat's responses were evaluated across Precise, Balanced, and Creative conversation style chat modes by three physicians using the Brief DISCERN tool. Consensus scoring was employed to assess appropriateness, guideline adherence, empathy, recommendation for physician consultation, and inability to answer the inquiry. Responses were also assessed for their directness and the presence of superfluous information. Chat modes were compared using descriptive statistics as well as ANOVA, Chi-Squared tests, and Fisher exact tests.</p><p><strong>Results: </strong>The median Brief DISCERN Score in Precise, Balanced, and Creative modes were: 22, 21, and 21, respectively. There was no significant difference in Brief DISCERN scores between the three chat modes (p = 0.68). Guideline adherence by chatbot conversation style was similar (p = 0.37), as was response appropriateness (p = 0.62), directly answering the question asked (p = 0.26) and providing a recommendation to consult with a healthcare provider (p = 0.07). Creative and balanced modes outperformed precise mode when evaluating response empathy. Creative mode was more likely to include superfluous information and less likely to answer the question.</p><p><strong>Conclusion: </strong>In its current iteration, Bing Chat provides low quality urologic healthcare information for nephrolithiasis queries, regardless of the conversation style utilized.</p>","PeriodicalId":23954,"journal":{"name":"World Journal of Urology","volume":"43 1","pages":"151"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Urology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00345-025-05533-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Artificial intelligence (AI) technology will inevitably permeate healthcare. Bing Chat is an AI chatbot with different conservation styles. We evaluated each of these response mode answers regarding management of nephrolithiasis.
Methods: A total of 20 questions were created based on the AUA Surgical Management of Stones guidelines. Bing Chat's responses were evaluated across Precise, Balanced, and Creative conversation style chat modes by three physicians using the Brief DISCERN tool. Consensus scoring was employed to assess appropriateness, guideline adherence, empathy, recommendation for physician consultation, and inability to answer the inquiry. Responses were also assessed for their directness and the presence of superfluous information. Chat modes were compared using descriptive statistics as well as ANOVA, Chi-Squared tests, and Fisher exact tests.
Results: The median Brief DISCERN Score in Precise, Balanced, and Creative modes were: 22, 21, and 21, respectively. There was no significant difference in Brief DISCERN scores between the three chat modes (p = 0.68). Guideline adherence by chatbot conversation style was similar (p = 0.37), as was response appropriateness (p = 0.62), directly answering the question asked (p = 0.26) and providing a recommendation to consult with a healthcare provider (p = 0.07). Creative and balanced modes outperformed precise mode when evaluating response empathy. Creative mode was more likely to include superfluous information and less likely to answer the question.
Conclusion: In its current iteration, Bing Chat provides low quality urologic healthcare information for nephrolithiasis queries, regardless of the conversation style utilized.
期刊介绍:
The WORLD JOURNAL OF UROLOGY conveys regularly the essential results of urological research and their practical and clinical relevance to a broad audience of urologists in research and clinical practice. In order to guarantee a balanced program, articles are published to reflect the developments in all fields of urology on an internationally advanced level. Each issue treats a main topic in review articles of invited international experts. Free papers are unrelated articles to the main topic.