Evaluation of DeepSeek-R1 and ChatGPT-4o as educational sources for upper tract urothelial carcinoma.

IF 1.9 Q3 UROLOGY & NEPHROLOGY
Central European Journal of Urology Pub Date : 2026-01-01 Epub Date: 2026-01-13 DOI:10.5173/ceju.2025.0238
Wojciech Krajewski, Jan Łaszkiewicz, Łukasz Biesiadecki, Wojciech Tomczak, Łukasz Nowak, Piotr Łaszkiewicz, Joanna Chorbińska, Francesco Del Giudice, Benjamin I Chung, Tomasz Szydełko
{"title":"Evaluation of DeepSeek-R1 and ChatGPT-4o as educational sources for upper tract urothelial carcinoma.","authors":"Wojciech Krajewski, Jan Łaszkiewicz, Łukasz Biesiadecki, Wojciech Tomczak, Łukasz Nowak, Piotr Łaszkiewicz, Joanna Chorbińska, Francesco Del Giudice, Benjamin I Chung, Tomasz Szydełko","doi":"10.5173/ceju.2025.0238","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Upper tract urothelial carcinoma (UTUC) is associated with poor survival outcomes. Therefore, providing reliable information about UTUC is crucial. Recently, chatbots powered by large language models have become a widely used information source. Our aim was to evaluate and compare responses generated by ChatGPT-4o and DeepSeek-R1 to patient-important questions regarding UTUC.</p><p><strong>Material and methods: </strong>A set of 43 questions assigned into four categories (general information, symptoms and diagnosis, treatment, prognosis) was curated. Each question was entered into DeepSeek-R1 and ChatGPT-4o. Answers were rated by two urologists using a scale from 1 (completely incorrect) to 4 (fully correct). The median score was calculated for each question. Median scores ≥3 were considered accurate. The repeatability of responses was evaluated using cosine similarity. The number of words in responses was counted.</p><p><strong>Results: </strong>The median scores for DeepSeek-R1 and ChatGPT-4o were both 3.5. There was no statistically significant difference between the scores assigned to two chatbots for all questions (p = 0.35), nor for any particular category.DeepSeek-R1 and ChatGPT-4o provided satisfactory answers for 93% and 91% of the evaluated questions, respectively. No potentially dangerous information was found. Both models consistently generated responses with moderate-high similarity (cosine similarity >0.5), except in one query. Finally, DeepSeek-R1 provided significantly longer answers than ChatGPT-4o (p <0.001).</p><p><strong>Conclusions: </strong>Both DeepSeek-R1 and ChatGPT-4o predominantly provide satisfactory responses to patient-important questions about UTUC. Artificial intelligence chatbots demonstrate potential as the first-line information sources for patients but struggle with highly specialized inquiries and thus cannot replace expert medical advice.</p>","PeriodicalId":9744,"journal":{"name":"Central European Journal of Urology","volume":"79 1","pages":"1-8"},"PeriodicalIF":1.9000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12976754/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Central European Journal of Urology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5173/ceju.2025.0238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/13 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Upper tract urothelial carcinoma (UTUC) is associated with poor survival outcomes. Therefore, providing reliable information about UTUC is crucial. Recently, chatbots powered by large language models have become a widely used information source. Our aim was to evaluate and compare responses generated by ChatGPT-4o and DeepSeek-R1 to patient-important questions regarding UTUC.

Material and methods: A set of 43 questions assigned into four categories (general information, symptoms and diagnosis, treatment, prognosis) was curated. Each question was entered into DeepSeek-R1 and ChatGPT-4o. Answers were rated by two urologists using a scale from 1 (completely incorrect) to 4 (fully correct). The median score was calculated for each question. Median scores ≥3 were considered accurate. The repeatability of responses was evaluated using cosine similarity. The number of words in responses was counted.

Results: The median scores for DeepSeek-R1 and ChatGPT-4o were both 3.5. There was no statistically significant difference between the scores assigned to two chatbots for all questions (p = 0.35), nor for any particular category.DeepSeek-R1 and ChatGPT-4o provided satisfactory answers for 93% and 91% of the evaluated questions, respectively. No potentially dangerous information was found. Both models consistently generated responses with moderate-high similarity (cosine similarity >0.5), except in one query. Finally, DeepSeek-R1 provided significantly longer answers than ChatGPT-4o (p <0.001).

Conclusions: Both DeepSeek-R1 and ChatGPT-4o predominantly provide satisfactory responses to patient-important questions about UTUC. Artificial intelligence chatbots demonstrate potential as the first-line information sources for patients but struggle with highly specialized inquiries and thus cannot replace expert medical advice.

Abstract Image

Abstract Image

Abstract Image

DeepSeek-R1和chatgpt - 40作为上尿路上皮癌教育源的评价。
导读:上尿路上皮癌(UTUC)与较差的生存结果相关。因此,提供有关UTUC的可靠信息至关重要。最近,由大型语言模型驱动的聊天机器人已经成为一种广泛使用的信息源。我们的目的是评估和比较chatgpt - 40和DeepSeek-R1对患者重要的UTUC问题的反应。材料和方法:收集了43个问题,分为四类(一般信息、症状和诊断、治疗、预后)。每个问题都被输入DeepSeek-R1和chatgpt - 40。两名泌尿科医生用1(完全错误)到4(完全正确)的等级对答案进行评分。每个问题的中位数是计算出来的。中位数得分≥3被认为是准确的。用余弦相似度评价反应的可重复性。统计了回复中的字数。结果:DeepSeek-R1和chatgpt - 40的中位得分均为3.5分。两个聊天机器人在所有问题上的得分没有统计学上的显著差异(p = 0.35),在任何特定类别上也是如此。DeepSeek-R1和chatgpt - 40分别为93%和91%的评估问题提供了满意的答案。没有发现潜在的危险信息。除了一个查询外,这两个模型都一致地生成具有中高相似性(余弦相似性>.5)的响应。最后,DeepSeek-R1提供的答案明显长于chatgpt - 40 (p结论:DeepSeek-R1和chatgpt - 40对患者重要的UTUC问题都提供了满意的答案。人工智能聊天机器人显示出作为患者第一线信息来源的潜力,但在高度专业化的询问方面存在困难,因此无法取代专家医疗建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Central European Journal of Urology
Central European Journal of Urology UROLOGY & NEPHROLOGY-
CiteScore
2.30
自引率
8.30%
发文量
48
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书