Improving clinical efficiency using retrieval-augmented generation in urologic oncology: A guideline-enhanced artificial intelligence approach

IF 1.6 Q3 UROLOGY & NEPHROLOGY
BJUI compass Pub Date : 2024-12-10 DOI:10.1002/bco2.427
Harry Collin, Matthew J. Roberts, Kandice Keogh, Amila Siriwardana, Marnique Basto
{"title":"Improving clinical efficiency using retrieval-augmented generation in urologic oncology: A guideline-enhanced artificial intelligence approach","authors":"Harry Collin,&nbsp;Matthew J. Roberts,&nbsp;Kandice Keogh,&nbsp;Amila Siriwardana,&nbsp;Marnique Basto","doi":"10.1002/bco2.427","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence (AI) in urology is evolving and has rapidly expanded since the release of ChatGPT and other large language models (LLMs). Early studies have found that AI-generated patient information is moderate to high quality for patient questions across multiple uro-oncology domains.<span><sup>1</sup></span> Extension into clinical decision-making suggests that ChatGPT can make decisions aligned with evidence-based medicine.<span><sup>2</sup></span></p><p>The key limitation of the publicly available ChatGPT (version 3.5) has been a reliance on knowledge confined to data published prior to September 2021.<span><sup>3</sup></span> Subscription-based ChatGPT Plus has since released ChatGPT 4.0, which is capable of web browsing. Equipped with the European Association of Urology (EAU) Guidelines, responses to urological queries are of higher quality,<span><sup>4</sup></span> and ChatGPT 4.0 has been shown to make complex medical decisions concordant with those discussed in multidisciplinary team meetings.<span><sup>5</sup></span> Furthermore, ChatGPT 4.0 allows LLMs to be curated for a specific task with retrieval-augmented generation (RAG), whereby a Generative Pre-trained Transformer (GPT) can use additional context provided by specialised materials to improve the accuracy of responses. Such advancements may realise the potential of AI systems to further urology practice via incorporation of up-to-date and highly specialised knowledge.</p><p>For instance, post-treatment imaging surveillance for renal cell carcinoma (RCC) is a common yet challenging clinical scenario due to disconcordance between histopathological diversity and guideline algorithms. The EAU Guidelines offer structured recommendations for follow-up imaging surveillance, which is pivotal for timely detection of cancer recurrence. However, effective clinical application of these guidelines requires specialised understanding of histopathology and can often be repetitive and time-consuming particularly when assigned to junior doctors.</p><p>This study aimed to test a GPT customised with RAG to interpret post-nephrectomy RCC histopathology reports and determine recommended follow-up surveillance imaging according to EAU Guidelines.</p><p>A RAG system was created using ChatGPT 4.0 (OpenAI via ChatGPT Plus, https://chat.openai.com/gpts/editor). The 2023 EAU Guidelines on RCC were uploaded to the GPT including Chapter 3 (Epidemiology, Aetiology and Pathology), Chapter 4 (Staging and Classification Systems) and Chapter 8 (Follow-Up in RCC). Code Interpreter capabilities were enabled, which allows the GPT to retrieve uploaded files and analyse data. Web browsing was disabled. No formal coding training or experience was required.</p><p>All instructions for the GPT were in free text (shown in Table S1). Instructions were written as three steps: interpret histopathology, determine surveillance regimen and output. The GPT was provided with clear directives to determine the risk profile (low, intermediate, or high risk) according to the EAU Guidelines, which uses Leibovich score for clear cell RCC (ccRCC) or histopathological stage and grade for non-ccRCC. The GPT was then instructed to recommend a follow-up imaging surveillance regimen, based on a template relative to risk profile (EAU Guidelines, tab. 8.1).</p><p>Simulated histopathology reports were created to represent all possible risk profiles (low, intermediate and high) relevant to the EAU Guidelines for the most common RCC subtypes—ccRCC, papillary (pRCC) and chromophobe (chRCC) tumours. Reports were structured according to International Society of Urological Pathology (ISUP) guidelines. Per the NHMRC National Statement on Ethical Conduct in Human Research 2023, this study did not require ethics committee approval as it utilised theoretical cases and does not meet the definition of human or animal research. Each report was input to the custom GPT on 15 January 2024. Responses were reviewed by two board-certified urologists for concordance with the EAU Guidelines.</p><p>Full histopathology reports and their raw outputs are shown in Table S2. Results are summarised in Figure 1, and concordance of custom GPT outputs with the EAU Guidelines for each simulated histopathology report is shown in Figure S3.</p><p>The custom GPT correctly determined all RCC risk profiles. All but one Leibovich score was correct (a 72 mm lesion was incorrectly defined as a tumour greater than 1 cm). All outputs recommended surveillance scans at the post-treatment intervals outlined in the EAU Guidelines. Three of the eight surveillance regimens (38%) were precisely concordant, while the remaining five contained additional imaging (3-month imaging for intermediate risk and 30-month imaging for high risk). The custom GPT proposed 2-yearly scans beyond 5 years for low-risk pRCC, contrary to the EAU Guidelines that suggest no further surveillance.</p><p>Most (6/8) recommendations specified imaging modality and body region (CT Chest and Abdomen). All outputs stated a guideline basis for the recommendation. Some outputs recommended renal and cardiovascular monitoring, which is mentioned in Chapter 8 of the EAU Guidelines, despite no specific prompting.</p><p>This study demonstrates the initial potential of RAG AI systems with integrated clinical guidelines to interpret results and make recommendations. This novel approach indicates the capacity of custom GPTs to handle complex and algorithmic tasks, which are often time-consuming and prone to human error. The surveillance regimens (38% concordance) generated under tailored instructions may show improvement over non-specialised ChatGPT 4 outputs, which lack focused access to specific guidelines and resulted in only 26% guideline concordance for prostate cancer.<span><sup>6</sup></span> Our results also compare favourably to studies using web-enabled ChatGPT 4.0 where 27% of responses to questions on kidney cancer, adapted from the EAU Guidelines, were of excellent quality.<span><sup>7</sup></span> Surveillance regimens were consistently safe, with no missing interval scans and additional scans in 62% of regimens, indicating a cautious approach.</p><p>AI supplementation of specialised urology knowledge has previously required programming skills.<span><sup>4, 8</sup></span> In this novel approach with a RAG design, we showed accurate, safe outputs from free text instructions without prior coding training. Consequently, ChatGPT 4.0 enables medical professionals to combine highly specialised knowledge with AI to enhance their clinical practice to their needs.</p><p>The inevitable introduction of AI to clinical settings must be met with close oversight from clinicians, especially when nuanced clinical judgements are involved. Here, the custom GPT miscalculated the Leibovich score from one theoretical report and could not consistently translate risk profile into a precisely concordant surveillance regimen. Conversely, unprompted recommendations promoted individualised care, such as renal and cardiovascular monitoring, so insights into comprehensive interpretation to enhance clinical practice were present.</p><p>A limitation of this study was partial inclusion of a single international guideline, which, despite being endorsed by 75 international societies, may limit GPT comprehensive integration. Future studies and AI developments could consider other guidelines (e.g., American Urological Association, National Institute for Health and Care Excellence). Ideally, development of a multi-guideline framework that utilises a decision-tree methodology could harness the power of AI to select and synthesise the most pertinent guidelines relevant to the local jurisdiction (limiting conflicts) or patient preferences for individualised care. Furthermore, web browsing was disabled to focus the AI on the uploaded guidelines but may have limited wider information access and integration, potentially affecting its handling of complex queries. Additionally, while the small series of histopathology report aimed to mitigate the custom GPT analysis complexity, future expansion of source number and content, as well as a training set, may further assess and enhance the custom GPT capabilities.</p><p>In conclusion, this focused evaluation of a GPT with integrated clinical guidelines illustrated the potential of AI, particularly RAG systems, for decision-making accuracy across the most common histopathological subtypes. Future incorporation may streamline clinical workflows and decision-making, but only with further evaluation, and cautious integration to ensure that these systems augment, not replace, clinician-directed personalised evidence-based care.</p><p>There are no conflicts of interest.</p>","PeriodicalId":72420,"journal":{"name":"BJUI compass","volume":"6 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771495/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BJUI compass","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bco2.427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) in urology is evolving and has rapidly expanded since the release of ChatGPT and other large language models (LLMs). Early studies have found that AI-generated patient information is moderate to high quality for patient questions across multiple uro-oncology domains.1 Extension into clinical decision-making suggests that ChatGPT can make decisions aligned with evidence-based medicine.2

The key limitation of the publicly available ChatGPT (version 3.5) has been a reliance on knowledge confined to data published prior to September 2021.3 Subscription-based ChatGPT Plus has since released ChatGPT 4.0, which is capable of web browsing. Equipped with the European Association of Urology (EAU) Guidelines, responses to urological queries are of higher quality,4 and ChatGPT 4.0 has been shown to make complex medical decisions concordant with those discussed in multidisciplinary team meetings.5 Furthermore, ChatGPT 4.0 allows LLMs to be curated for a specific task with retrieval-augmented generation (RAG), whereby a Generative Pre-trained Transformer (GPT) can use additional context provided by specialised materials to improve the accuracy of responses. Such advancements may realise the potential of AI systems to further urology practice via incorporation of up-to-date and highly specialised knowledge.

For instance, post-treatment imaging surveillance for renal cell carcinoma (RCC) is a common yet challenging clinical scenario due to disconcordance between histopathological diversity and guideline algorithms. The EAU Guidelines offer structured recommendations for follow-up imaging surveillance, which is pivotal for timely detection of cancer recurrence. However, effective clinical application of these guidelines requires specialised understanding of histopathology and can often be repetitive and time-consuming particularly when assigned to junior doctors.

This study aimed to test a GPT customised with RAG to interpret post-nephrectomy RCC histopathology reports and determine recommended follow-up surveillance imaging according to EAU Guidelines.

A RAG system was created using ChatGPT 4.0 (OpenAI via ChatGPT Plus, https://chat.openai.com/gpts/editor). The 2023 EAU Guidelines on RCC were uploaded to the GPT including Chapter 3 (Epidemiology, Aetiology and Pathology), Chapter 4 (Staging and Classification Systems) and Chapter 8 (Follow-Up in RCC). Code Interpreter capabilities were enabled, which allows the GPT to retrieve uploaded files and analyse data. Web browsing was disabled. No formal coding training or experience was required.

All instructions for the GPT were in free text (shown in Table S1). Instructions were written as three steps: interpret histopathology, determine surveillance regimen and output. The GPT was provided with clear directives to determine the risk profile (low, intermediate, or high risk) according to the EAU Guidelines, which uses Leibovich score for clear cell RCC (ccRCC) or histopathological stage and grade for non-ccRCC. The GPT was then instructed to recommend a follow-up imaging surveillance regimen, based on a template relative to risk profile (EAU Guidelines, tab. 8.1).

Simulated histopathology reports were created to represent all possible risk profiles (low, intermediate and high) relevant to the EAU Guidelines for the most common RCC subtypes—ccRCC, papillary (pRCC) and chromophobe (chRCC) tumours. Reports were structured according to International Society of Urological Pathology (ISUP) guidelines. Per the NHMRC National Statement on Ethical Conduct in Human Research 2023, this study did not require ethics committee approval as it utilised theoretical cases and does not meet the definition of human or animal research. Each report was input to the custom GPT on 15 January 2024. Responses were reviewed by two board-certified urologists for concordance with the EAU Guidelines.

Full histopathology reports and their raw outputs are shown in Table S2. Results are summarised in Figure 1, and concordance of custom GPT outputs with the EAU Guidelines for each simulated histopathology report is shown in Figure S3.

The custom GPT correctly determined all RCC risk profiles. All but one Leibovich score was correct (a 72 mm lesion was incorrectly defined as a tumour greater than 1 cm). All outputs recommended surveillance scans at the post-treatment intervals outlined in the EAU Guidelines. Three of the eight surveillance regimens (38%) were precisely concordant, while the remaining five contained additional imaging (3-month imaging for intermediate risk and 30-month imaging for high risk). The custom GPT proposed 2-yearly scans beyond 5 years for low-risk pRCC, contrary to the EAU Guidelines that suggest no further surveillance.

Most (6/8) recommendations specified imaging modality and body region (CT Chest and Abdomen). All outputs stated a guideline basis for the recommendation. Some outputs recommended renal and cardiovascular monitoring, which is mentioned in Chapter 8 of the EAU Guidelines, despite no specific prompting.

This study demonstrates the initial potential of RAG AI systems with integrated clinical guidelines to interpret results and make recommendations. This novel approach indicates the capacity of custom GPTs to handle complex and algorithmic tasks, which are often time-consuming and prone to human error. The surveillance regimens (38% concordance) generated under tailored instructions may show improvement over non-specialised ChatGPT 4 outputs, which lack focused access to specific guidelines and resulted in only 26% guideline concordance for prostate cancer.6 Our results also compare favourably to studies using web-enabled ChatGPT 4.0 where 27% of responses to questions on kidney cancer, adapted from the EAU Guidelines, were of excellent quality.7 Surveillance regimens were consistently safe, with no missing interval scans and additional scans in 62% of regimens, indicating a cautious approach.

AI supplementation of specialised urology knowledge has previously required programming skills.4, 8 In this novel approach with a RAG design, we showed accurate, safe outputs from free text instructions without prior coding training. Consequently, ChatGPT 4.0 enables medical professionals to combine highly specialised knowledge with AI to enhance their clinical practice to their needs.

The inevitable introduction of AI to clinical settings must be met with close oversight from clinicians, especially when nuanced clinical judgements are involved. Here, the custom GPT miscalculated the Leibovich score from one theoretical report and could not consistently translate risk profile into a precisely concordant surveillance regimen. Conversely, unprompted recommendations promoted individualised care, such as renal and cardiovascular monitoring, so insights into comprehensive interpretation to enhance clinical practice were present.

A limitation of this study was partial inclusion of a single international guideline, which, despite being endorsed by 75 international societies, may limit GPT comprehensive integration. Future studies and AI developments could consider other guidelines (e.g., American Urological Association, National Institute for Health and Care Excellence). Ideally, development of a multi-guideline framework that utilises a decision-tree methodology could harness the power of AI to select and synthesise the most pertinent guidelines relevant to the local jurisdiction (limiting conflicts) or patient preferences for individualised care. Furthermore, web browsing was disabled to focus the AI on the uploaded guidelines but may have limited wider information access and integration, potentially affecting its handling of complex queries. Additionally, while the small series of histopathology report aimed to mitigate the custom GPT analysis complexity, future expansion of source number and content, as well as a training set, may further assess and enhance the custom GPT capabilities.

In conclusion, this focused evaluation of a GPT with integrated clinical guidelines illustrated the potential of AI, particularly RAG systems, for decision-making accuracy across the most common histopathological subtypes. Future incorporation may streamline clinical workflows and decision-making, but only with further evaluation, and cautious integration to ensure that these systems augment, not replace, clinician-directed personalised evidence-based care.

There are no conflicts of interest.

Abstract Image

在泌尿外科肿瘤学中使用检索增强生成提高临床效率:一种指南增强的人工智能方法。
一些产出建议进行肾脏和心血管监测,这在EAU准则第8章中提到,尽管没有具体提示。这项研究证明了RAG人工智能系统具有综合临床指南的初步潜力,可以解释结果并提出建议。这种新颖的方法表明了定制gpt处理复杂和算法任务的能力,这些任务通常耗时且容易出现人为错误。根据量身定制的指导生成的监测方案(38%的一致性)可能比非专业的ChatGPT 4输出有所改善,后者缺乏对特定指南的集中访问,导致前列腺癌指南一致性仅为26%我们的结果也优于使用网络ChatGPT 4.0的研究,其中27%的肾癌问题的回答是优秀的质量,改编自EAU指南监测方案始终是安全的,62%的方案没有遗漏间隔扫描和额外扫描,表明采用谨慎的方法。人工智能补充专业泌尿学知识以前需要编程技能。4,8在这种具有RAG设计的新方法中,我们在没有事先编码训练的情况下,从自由文本指令中显示了准确、安全的输出。因此,ChatGPT 4.0使医疗专业人员能够将高度专业化的知识与人工智能相结合,以增强他们的临床实践以满足他们的需求。人工智能不可避免地引入临床环境,必须得到临床医生的密切监督,特别是在涉及微妙的临床判断时。在这里,定制GPT错误地计算了一份理论报告中的莱博维奇分数,并且不能始终如一地将风险概况转化为精确一致的监测方案。相反,未经提示的建议促进了个性化护理,如肾脏和心血管监测,因此对综合解释的见解可以增强临床实践。本研究的一个局限性是部分纳入了单一的国际指南,尽管该指南已得到75个国际协会的认可,但可能会限制GPT的全面整合。未来的研究和人工智能发展可以考虑其他指南(例如,美国泌尿外科协会、国家健康和护理卓越研究所)。理想情况下,开发一个利用决策树方法的多指南框架可以利用人工智能的力量来选择和综合与当地司法管辖区(限制冲突)或患者个性化护理偏好相关的最相关指南。此外,网页浏览被禁用,以使人工智能专注于上传的指导方针,但可能会限制更广泛的信息访问和整合,潜在地影响其对复杂查询的处理。此外,虽然小系列的组织病理学报告旨在减轻自定义GPT分析的复杂性,但未来源数量和内容的扩展以及训练集可能会进一步评估和增强自定义GPT的能力。总之,综合临床指南对GPT的重点评估表明,人工智能,特别是RAG系统,在最常见的组织病理学亚型中具有决策准确性的潜力。未来的合并可能会简化临床工作流程和决策,但只有进一步的评估和谨慎的整合才能确保这些系统增强,而不是取代临床指导的个性化循证护理。没有利益冲突。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.30
自引率
0.00%
发文量
0
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信