Capabilities of ChatGPT-3.5 as a Urological Triage System

IF 3.2 3区医学 Q1 UROLOGY & NEPHROLOGY

European Urology Open Science Pub Date : 2024-11-01 DOI:10.1016/j.euros.2024.10.015

Christopher Hirtsiefer , Tim Nestler , Johanna Eckrich , Henrieke Beverungen , Carolin Siech , Cem Aksoy , Marianne Leitsmann , Martin Baunacke , Annemarie Uhlig

{"title":"Capabilities of ChatGPT-3.5 as a Urological Triage System","authors":"Christopher Hirtsiefer , Tim Nestler , Johanna Eckrich , Henrieke Beverungen , Carolin Siech , Cem Aksoy , Marianne Leitsmann , Martin Baunacke , Annemarie Uhlig","doi":"10.1016/j.euros.2024.10.015","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Patients struggle to classify symptoms, which hinders timely medical presentation. With 35–75% of patients seeking information online before consulting a health care professional, generative language–based artificial intelligence (AI), exemplified by ChatGPT-3.5 (GPT-3.5) from OpenAI, has emerged as an important source. The aim of our study was to evaluate the role of GPT-3.5 in triaging acute urological conditions to address a gap in current research.</div></div><div><h3>Methods</h3><div>We assessed GPT-3.5 performance in providing urological differential diagnoses (DD) and recommending a course of action (CoA). Six acute urological pathologies were identified for evaluation. Lay descriptions, sourced from patient forums, formed the basis for 472 queries that were independently entered by nine urologists. We evaluated the output in terms of compliance with the European Association of Urology (EAU) guidelines, the quality of the patient information using the validated DISCERN questionnaire, and a linguistic analysis.</div></div><div><h3>Key findings and limitations</h3><div>The median GPT-3.5 ratings were 4/5 for DD and CoA, and 3/5 for overall information quality. English outputs received higher median ratings than German outputs for DD (4.27 vs 3.95; <em>p</em> < 0.001) and CoA (4.25 vs 4.05; <em>p</em> < 0.005). There was no difference in performance between urgent and non-urgent cases. Analysis of the information quality revealed notable underperformance for source indication, risk assessment, and influence on quality of life.</div></div><div><h3>Conclusion and clinical implications</h3><div>Our results highlights the potential of GPT-3.5 as a triage system for offering individualized, empathetic advice mostly aligned with the EAU guidelines, outscoring other online information. Relevant shortcomings in terms of information quality, especially for risk assessment, need to be addressed to enhance the reliability. Broader transparency and quality improvements are needed before integration into, primarily English-speaking, patient care.</div></div><div><h3>Patient summary</h3><div>We looked at the performance of ChatGPT-3.5 for patients seeking urology advice. We entered more than 400 German and English inputs and assessed the possible diagnoses suggested by this artificial intelligence tool. ChatGPT-3.5 scored well in providing a complete list of possible diagnoses and recommending a course of action mostly in line with current guidelines. The quality of the information was good overall, but missing and unclear sources for the information can be a problem.</div></div>","PeriodicalId":12254,"journal":{"name":"European Urology Open Science","volume":"70 ","pages":"Pages 148-153"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Urology Open Science","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666168324011091","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objective

Patients struggle to classify symptoms, which hinders timely medical presentation. With 35–75% of patients seeking information online before consulting a health care professional, generative language–based artificial intelligence (AI), exemplified by ChatGPT-3.5 (GPT-3.5) from OpenAI, has emerged as an important source. The aim of our study was to evaluate the role of GPT-3.5 in triaging acute urological conditions to address a gap in current research.

Methods

We assessed GPT-3.5 performance in providing urological differential diagnoses (DD) and recommending a course of action (CoA). Six acute urological pathologies were identified for evaluation. Lay descriptions, sourced from patient forums, formed the basis for 472 queries that were independently entered by nine urologists. We evaluated the output in terms of compliance with the European Association of Urology (EAU) guidelines, the quality of the patient information using the validated DISCERN questionnaire, and a linguistic analysis.

Key findings and limitations

The median GPT-3.5 ratings were 4/5 for DD and CoA, and 3/5 for overall information quality. English outputs received higher median ratings than German outputs for DD (4.27 vs 3.95; p < 0.001) and CoA (4.25 vs 4.05; p < 0.005). There was no difference in performance between urgent and non-urgent cases. Analysis of the information quality revealed notable underperformance for source indication, risk assessment, and influence on quality of life.

Conclusion and clinical implications

Our results highlights the potential of GPT-3.5 as a triage system for offering individualized, empathetic advice mostly aligned with the EAU guidelines, outscoring other online information. Relevant shortcomings in terms of information quality, especially for risk assessment, need to be addressed to enhance the reliability. Broader transparency and quality improvements are needed before integration into, primarily English-speaking, patient care.

Patient summary

We looked at the performance of ChatGPT-3.5 for patients seeking urology advice. We entered more than 400 German and English inputs and assessed the possible diagnoses suggested by this artificial intelligence tool. ChatGPT-3.5 scored well in providing a complete list of possible diagnoses and recommending a course of action mostly in line with current guidelines. The quality of the information was good overall, but missing and unclear sources for the information can be a problem.

查看原文本刊更多论文

ChatGPT-3.5 作为泌尿科分诊系统的功能

背景和目标患者很难对症状进行分类，这妨碍了及时就医。35%-75%的患者在咨询医疗专业人员之前会在网上寻求信息，基于生成语言的人工智能（AI），例如 OpenAI 的 ChatGPT-3.5 (GPT-3.5)，已成为一个重要的信息来源。我们的研究旨在评估 GPT-3.5 在急性泌尿系统疾病分诊中的作用，以弥补当前研究的不足。我们确定了六种急性泌尿系统病症进行评估。由九名泌尿科医生独立输入的 472 条查询以来自患者论坛的非专业描述为基础。我们从是否符合欧洲泌尿外科协会（EAU）指南、使用经验证的 DISCERN 问卷的患者信息质量以及语言分析等方面对输出结果进行了评估。主要发现和局限性GPT-3.5 的中位数评分为：DD 和 CoA 4/5，总体信息质量 3/5。在 DD（4.27 vs 3.95; p < 0.001）和 CoA（4.25 vs 4.05; p < 0.005）方面，英语输出的评分中值高于德语输出。紧急和非紧急病例的绩效没有差异。对信息质量的分析表明，在来源适应症、风险评估和对生活质量的影响方面，GPT-3.5 的表现明显不佳。结论和临床意义我们的研究结果凸显了 GPT-3.5 作为分诊系统的潜力，它可以提供个性化、富有同情心的建议，这些建议大多与 EAU 指南一致，优于其他在线信息。需要解决信息质量方面的相关缺陷，尤其是风险评估方面的缺陷，以提高其可靠性。在整合到主要以英语为母语的患者护理中之前，需要更广泛的透明度和质量改进。患者摘要我们研究了 ChatGPT-3.5 在为寻求泌尿科建议的患者提供建议方面的表现。我们输入了 400 多条德语和英语输入信息，并评估了这款人工智能工具提出的可能诊断。ChatGPT-3.5 在提供完整的可能诊断列表和建议行动方案方面表现出色，大部分符合当前的指导方针。信息质量总体良好，但信息来源缺失和不明确可能是个问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊