From ChatGPT to UroGPT: A guideline-trained artificial intelligence model for male infertility.

IF 1.3 4区 医学 Q4 UROLOGY & NEPHROLOGY
Current Urology Pub Date : 2026-05-01 Epub Date: 2026-01-29 DOI:10.1097/CU9.0000000000000328
Elie Kaplan-Marans, Yitzchak E Katlowitz, Michael West, Navid Leelani, Christopher Edwards, David Silver, Jacob Khurgin
{"title":"From ChatGPT to UroGPT: A guideline-trained artificial intelligence model for male infertility.","authors":"Elie Kaplan-Marans, Yitzchak E Katlowitz, Michael West, Navid Leelani, Christopher Edwards, David Silver, Jacob Khurgin","doi":"10.1097/CU9.0000000000000328","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>ChatGPT is not yet sufficiently reliable for answering clinical questions relevant to direct patient care. We hypothesized that a GPT model trained exclusively on expert guidelines would provide more accurate, guideline-concordant responses.</p><p><strong>Materials and methods: </strong>With permission from the European Association of Urology, we developed UroGPT, a custom GPT model trained solely on the European Association of Urology guidelines. We posed 25 clinical questions derived from the Male Infertility Guidelines and expert opinions to both the standard ChatGPT (GPT-4o) and UroGPT. Responses were anonymized and graded by 2 blinded reviewers as \"complete and accurate,\" \"incomplete but accurate,\" and \"incorrect or misleading.\" Guideline concordance was compared using the chi-square test.</p><p><strong>Results: </strong>UroGPT demonstrated significantly greater concordance with guideline-based responses than ChatGPT (<i>p</i> < 0.001). UroGPT provided 94% (47/50) complete and accurate responses, whereas ChatGPT provided only 38% (19/50). ChatGPT also produced a significantly higher rate of incorrect or misleading responses (52% vs. 4%). Inter-reviewer agreement was higher for UroGPT (88% vs. 48%), suggesting that its answers were clearer and more consistent with the guidelines. ChatGPT frequently overgeneralized, recommended unsupported interventions, or offered non-guideline-based lifestyle advice. However, both models failed to answer correctly 2 high-stakes questions regarding orchiectomy in patients with undescended testes.</p><p><strong>Conclusions: </strong>UroGPT markedly outperformed ChatGPT in guideline concordance. Training artificial intelligence models on expert-authored content represents a meaningful step toward developing clinically useful large language models. However, UroGPT is not yet appropriate for direct patient care and should currently be used only for research and academic purposes.</p>","PeriodicalId":39147,"journal":{"name":"Current Urology","volume":"20 3","pages":"135-140"},"PeriodicalIF":1.3000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13068478/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Urology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/CU9.0000000000000328","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/29 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: ChatGPT is not yet sufficiently reliable for answering clinical questions relevant to direct patient care. We hypothesized that a GPT model trained exclusively on expert guidelines would provide more accurate, guideline-concordant responses.

Materials and methods: With permission from the European Association of Urology, we developed UroGPT, a custom GPT model trained solely on the European Association of Urology guidelines. We posed 25 clinical questions derived from the Male Infertility Guidelines and expert opinions to both the standard ChatGPT (GPT-4o) and UroGPT. Responses were anonymized and graded by 2 blinded reviewers as "complete and accurate," "incomplete but accurate," and "incorrect or misleading." Guideline concordance was compared using the chi-square test.

Results: UroGPT demonstrated significantly greater concordance with guideline-based responses than ChatGPT (p < 0.001). UroGPT provided 94% (47/50) complete and accurate responses, whereas ChatGPT provided only 38% (19/50). ChatGPT also produced a significantly higher rate of incorrect or misleading responses (52% vs. 4%). Inter-reviewer agreement was higher for UroGPT (88% vs. 48%), suggesting that its answers were clearer and more consistent with the guidelines. ChatGPT frequently overgeneralized, recommended unsupported interventions, or offered non-guideline-based lifestyle advice. However, both models failed to answer correctly 2 high-stakes questions regarding orchiectomy in patients with undescended testes.

Conclusions: UroGPT markedly outperformed ChatGPT in guideline concordance. Training artificial intelligence models on expert-authored content represents a meaningful step toward developing clinically useful large language models. However, UroGPT is not yet appropriate for direct patient care and should currently be used only for research and academic purposes.

从ChatGPT到UroGPT:指导训练的男性不育症人工智能模型。
背景:ChatGPT在回答与直接患者护理相关的临床问题方面还不够可靠。我们假设,专门训练专家指南的GPT模型将提供更准确、更符合指南的反应。材料和方法:在欧洲泌尿外科协会的许可下,我们开发了UroGPT,这是一种定制的GPT模型,仅根据欧洲泌尿外科协会的指导方针进行培训。我们根据男性不育指南和专家意见对标准ChatGPT (gpt - 40)和UroGPT提出了25个临床问题。回答是匿名的,并由两名盲法评论者打分,分为“完整和准确”、“不完整但准确”和“不正确或误导”。指南一致性比较采用卡方检验。结果:与ChatGPT相比,UroGPT与基于指南的反应的一致性显著提高(p < 0.001)。UroGPT提供了94%(47/50)完整和准确的回答,而ChatGPT仅提供38%(19/50)。ChatGPT还产生了明显更高的不正确或误导性回答率(52% vs. 4%)。UroGPT的审稿人间一致性更高(88%对48%),表明其答案更清晰,更符合指南。ChatGPT经常过度概括,推荐无支持的干预措施,或提供非指南基础的生活方式建议。然而,两种模型都未能正确回答关于隐睾患者睾丸切除术的两个高风险问题。结论:UroGPT在指南一致性方面明显优于ChatGPT。在专家撰写的内容上训练人工智能模型是朝着开发临床有用的大型语言模型迈出的有意义的一步。然而,UroGPT还不适合直接患者护理,目前应仅用于研究和学术目的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Urology
Current Urology Medicine-Urology
CiteScore
2.30
自引率
0.00%
发文量
96
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书