Clinical applications and limitations of large language models in nephrology: a systematic review.

IF 4.6 2区 医学 Q1 UROLOGY & NEPHROLOGY
Clinical Kidney Journal Pub Date : 2025-09-18 eCollection Date: 2025-09-01 DOI:10.1093/ckj/sfaf243
Zoe Unger, Shelly Soffer, Orly Efros, Lili Chan, Eyal Klang, Girish N Nadkarni
{"title":"Clinical applications and limitations of large language models in nephrology: a systematic review.","authors":"Zoe Unger, Shelly Soffer, Orly Efros, Lili Chan, Eyal Klang, Girish N Nadkarni","doi":"10.1093/ckj/sfaf243","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) have emerged as potential tools in healthcare. This systematic review evaluates the applications of text-generative conversational LLMs in nephrology, with particular attention to their reported advantages and limitations.</p><p><strong>Methods: </strong>A systematic search was performed in PubMed, Web of Science, Embase and the Cochrane Library in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Eligible studies assessed LLM applications in nephrology. PROSPERO registration number CRD42024550169.</p><p><strong>Results: </strong>Of 1070 records screened, 23 studies met inclusion criteria, addressing four clinical applications in nephrology. In patient education (<i>n</i> = 13), GPT-4 improved the readability of kidney donation information from a 10th to a 4th grade level (9.6 ± 1.9 to 4.30 ± 1.71) and Gemini provided the most accurate answers to chronic kidney disease questions (Global Quality Score 3.46 ± 0.55). Regarding workflow optimization (<i>n</i> = 7), GPT-4 achieved high accuracy (90-94%) in managing continuous renal replacement therapy alarms and improved diagnosis of diabetes insipidus using chain-of-thought and retrieval-augmented prompting. In renal dietary guidance (<i>n</i> = 2), Bard AI led in classifying phosphorus and oxalate content of foods (100% and 84%), while GPT-4 and Bing Chat were most accurate for potassium classification (81%). For laboratory data interpretation (<i>n</i> = 1), Copilot significantly outperformed ChatGPT and Gemini in simulated nephrology datasets (median scores 5/5 compared with 4/5 and 4/5; <i>P</i> < .01). TRIPOD-LLM assessment revealed frequent omissions in data handling, prompting strategies and transparency.</p><p><strong>Conclusions: </strong>While LLMs may enhance various aspects of nephrology practice, their widespread adoption remains premature. Input-quality dependence and limited external validation restrict generalizability. Further research is needed to confirm their real-world feasibility and ensure safe clinical integration.</p>","PeriodicalId":10435,"journal":{"name":"Clinical Kidney Journal","volume":"18 9","pages":"sfaf243"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12461145/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Kidney Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ckj/sfaf243","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large language models (LLMs) have emerged as potential tools in healthcare. This systematic review evaluates the applications of text-generative conversational LLMs in nephrology, with particular attention to their reported advantages and limitations.

Methods: A systematic search was performed in PubMed, Web of Science, Embase and the Cochrane Library in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Eligible studies assessed LLM applications in nephrology. PROSPERO registration number CRD42024550169.

Results: Of 1070 records screened, 23 studies met inclusion criteria, addressing four clinical applications in nephrology. In patient education (n = 13), GPT-4 improved the readability of kidney donation information from a 10th to a 4th grade level (9.6 ± 1.9 to 4.30 ± 1.71) and Gemini provided the most accurate answers to chronic kidney disease questions (Global Quality Score 3.46 ± 0.55). Regarding workflow optimization (n = 7), GPT-4 achieved high accuracy (90-94%) in managing continuous renal replacement therapy alarms and improved diagnosis of diabetes insipidus using chain-of-thought and retrieval-augmented prompting. In renal dietary guidance (n = 2), Bard AI led in classifying phosphorus and oxalate content of foods (100% and 84%), while GPT-4 and Bing Chat were most accurate for potassium classification (81%). For laboratory data interpretation (n = 1), Copilot significantly outperformed ChatGPT and Gemini in simulated nephrology datasets (median scores 5/5 compared with 4/5 and 4/5; P < .01). TRIPOD-LLM assessment revealed frequent omissions in data handling, prompting strategies and transparency.

Conclusions: While LLMs may enhance various aspects of nephrology practice, their widespread adoption remains premature. Input-quality dependence and limited external validation restrict generalizability. Further research is needed to confirm their real-world feasibility and ensure safe clinical integration.

大语言模型在肾脏病学中的临床应用和局限性:系统综述。
背景:大型语言模型(llm)已经成为医疗保健领域的潜在工具。本系统综述评估了文本生成对话法学硕士在肾脏病学中的应用,特别关注其报道的优势和局限性。方法:在PubMed、Web of Science、Embase和Cochrane图书馆按照系统评价和meta分析指南的首选报告项目进行系统检索。合格的研究评估了法学硕士在肾脏病学中的应用。普洛斯彼罗注册号CRD42024550169。结果:在筛选的1070份记录中,23项研究符合纳入标准,涉及肾内科的4项临床应用。在患者教育方面(n = 13), GPT-4将肾脏捐赠信息的可读性从10年级提高到4年级(9.6±1.9到4.30±1.71),Gemini提供了最准确的慢性肾脏疾病问题答案(全球质量评分3.46±0.55)。在工作流程优化方面(n = 7), GPT-4在管理持续肾替代治疗报警方面取得了较高的准确性(90-94%),并通过思维链和检索增强提示改善了尿囊症的诊断。在肾脏膳食指导(n = 2)中,Bard AI对食物中磷和草酸含量的分类准确率最高(100%和84%),而GPT-4和Bing Chat对钾含量的分类准确率最高(81%)。对于实验室数据解释(n = 1), Copilot在模拟肾脏病数据集上的表现明显优于ChatGPT和Gemini(中位数得分为5/5,而中位数得分为4/5和4/5;P结论:虽然llm可以增强肾脏病学实践的各个方面,但其广泛采用仍不成熟。输入质量依赖和有限的外部验证限制了推广。需要进一步的研究来证实它们在现实世界中的可行性,并确保安全的临床应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Clinical Kidney Journal
Clinical Kidney Journal Medicine-Transplantation
CiteScore
6.70
自引率
10.90%
发文量
242
审稿时长
8 weeks
期刊介绍: About the Journal Clinical Kidney Journal: Clinical and Translational Nephrology (ckj), an official journal of the ERA-EDTA (European Renal Association-European Dialysis and Transplant Association), is a fully open access, online only journal publishing bimonthly. The journal is an essential educational and training resource integrating clinical, translational and educational research into clinical practice. ckj aims to contribute to a translational research culture among nephrologists and kidney pathologists that helps close the gap between basic researchers and practicing clinicians and promote sorely needed innovation in the Nephrology field. All research articles in this journal have undergone peer review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信