EVIDENCE-BASED DIGITAL SUPPORT IN HEPATOLOGY: RETRIEVAL-AUGMENTED GENERATION'S ROLE IN AUTOIMMUNE LIVER DISEASES MANAGEMENT

IF 4.4 3区 医学 Q2 GASTROENTEROLOGY & HEPATOLOGY
Ezequiel Ridruejo , Ernesto Saenz , Jimmy Daza , Heike Bantel , Marcos Girala , Matthias Ebert , Florian Van Bommel , Andreas Geier , Andres Gomez Aldana , Mario Reis Alvares-da-Silvai , Markus Peck-Radosavljevicj , Frank Tacke , Arndt Weinmann , Juan Turnes , Javier Pazo , Andreas Teufel
{"title":"EVIDENCE-BASED DIGITAL SUPPORT IN HEPATOLOGY: RETRIEVAL-AUGMENTED GENERATION'S ROLE IN AUTOIMMUNE LIVER DISEASES MANAGEMENT","authors":"Ezequiel Ridruejo ,&nbsp;Ernesto Saenz ,&nbsp;Jimmy Daza ,&nbsp;Heike Bantel ,&nbsp;Marcos Girala ,&nbsp;Matthias Ebert ,&nbsp;Florian Van Bommel ,&nbsp;Andreas Geier ,&nbsp;Andres Gomez Aldana ,&nbsp;Mario Reis Alvares-da-Silvai ,&nbsp;Markus Peck-Radosavljevicj ,&nbsp;Frank Tacke ,&nbsp;Arndt Weinmann ,&nbsp;Juan Turnes ,&nbsp;Javier Pazo ,&nbsp;Andreas Teufel","doi":"10.1016/j.aohep.2025.101957","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction and Objectives</h3><div>Autoimmune liver diseases (AILDs) present significant diagnostic and management challenges. Following our initial evaluation of Large Language Models (LLMs), we developed and assessed three specialized Retrieval-Augmented Generation (RAG) systems. These systems incorporated comprehensive clinical guidelines and medication safety information to enhance decision support accuracy. Our aim was to evaluate the effectiveness of Retrieval-augmented AI systems in providing evidence-based recommendations for AILD management.</div></div><div><h3>Materials and Methods</h3><div>We engineered three distinct RAG systems: HepaChat, RAG-ChatGPT, and RAG-Claude. Each system integrated 13 international clinical guidelines spanning AIH, PBC, and PSC management. Additionally, we incorporated a comprehensive database containing 12,465 FDA medication warnings to ensure safety protocol adherence. Ten liver specialists (six European, four American) evaluated system responses to 56 standardized clinical questions using a 1-10 Likert scale. Questions addressed disease comprehension, therapeutic approaches, and clinical decision-making across all three major AILDs.</div></div><div><h3>Results</h3><div>Quantitative analysis revealed HepaChat's superior performance (mean score 7.58±1.48) with 33 best-rated responses, compared to RAG-Claude (7.22±1.58, 12 best-rated) and RAG-ChatGPT (7.21±1.67, 9 best-rated). Geographic stratification unveiled variations in evaluation patterns (Americas: 7.97 vs Europe: 6.40). Disease-specific analysis demonstrated HepaChat's excellence in AIH (Europe: 7.12, Americas: 8.17) and PSC management in Europe (6.89), while achieving optimal performance in AIH and PBC in the Americas (8.17 and 8.37, respectively). All three systems showed marked improvement over conventional LLMs (2023 benchmark: 6.72±1.67).</div></div><div><h3>Conclusions</h3><div>This evaluation demonstrates that specialized RAG systems incorporating clinical guidelines and safety protocols can significantly enhance AILD management support. Geographic variations in assessment highlight the importance of considering regional clinical perspectives in AI system development.</div></div>","PeriodicalId":7979,"journal":{"name":"Annals of hepatology","volume":"30 ","pages":"Article 101957"},"PeriodicalIF":4.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of hepatology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1665268125001826","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction and Objectives

Autoimmune liver diseases (AILDs) present significant diagnostic and management challenges. Following our initial evaluation of Large Language Models (LLMs), we developed and assessed three specialized Retrieval-Augmented Generation (RAG) systems. These systems incorporated comprehensive clinical guidelines and medication safety information to enhance decision support accuracy. Our aim was to evaluate the effectiveness of Retrieval-augmented AI systems in providing evidence-based recommendations for AILD management.

Materials and Methods

We engineered three distinct RAG systems: HepaChat, RAG-ChatGPT, and RAG-Claude. Each system integrated 13 international clinical guidelines spanning AIH, PBC, and PSC management. Additionally, we incorporated a comprehensive database containing 12,465 FDA medication warnings to ensure safety protocol adherence. Ten liver specialists (six European, four American) evaluated system responses to 56 standardized clinical questions using a 1-10 Likert scale. Questions addressed disease comprehension, therapeutic approaches, and clinical decision-making across all three major AILDs.

Results

Quantitative analysis revealed HepaChat's superior performance (mean score 7.58±1.48) with 33 best-rated responses, compared to RAG-Claude (7.22±1.58, 12 best-rated) and RAG-ChatGPT (7.21±1.67, 9 best-rated). Geographic stratification unveiled variations in evaluation patterns (Americas: 7.97 vs Europe: 6.40). Disease-specific analysis demonstrated HepaChat's excellence in AIH (Europe: 7.12, Americas: 8.17) and PSC management in Europe (6.89), while achieving optimal performance in AIH and PBC in the Americas (8.17 and 8.37, respectively). All three systems showed marked improvement over conventional LLMs (2023 benchmark: 6.72±1.67).

Conclusions

This evaluation demonstrates that specialized RAG systems incorporating clinical guidelines and safety protocols can significantly enhance AILD management support. Geographic variations in assessment highlight the importance of considering regional clinical perspectives in AI system development.
肝病学中的循证数字支持:检索增强代在自身免疫性肝病管理中的作用
自身免疫性肝病(AILDs)的诊断和治疗面临重大挑战。在我们对大型语言模型(llm)的初步评估之后,我们开发并评估了三个专门的检索增强生成(RAG)系统。这些系统纳入了全面的临床指南和药物安全信息,以提高决策支持的准确性。我们的目的是评估检索增强人工智能系统在为AILD管理提供循证建议方面的有效性。材料和方法我们设计了三种不同的RAG系统:HepaChat, RAG- chatgpt和RAG- claude。每个系统集成了13个国际临床指南,涵盖AIH, PBC和PSC管理。此外,我们纳入了一个包含12,465个FDA药物警告的综合数据库,以确保安全协议的遵守。10名肝脏专家(6名欧洲人,4名美国人)使用1-10李克特量表评估了系统对56个标准化临床问题的反应。问题涉及所有三种主要aild的疾病理解、治疗方法和临床决策。结果HepaChat的平均评分为7.58±1.48,评分最高的有33个,而RAG-Claude评分为7.22±1.58,评分最高的有12个;RAG-ChatGPT评分为7.21±1.67,评分最高的有9个。地理分层揭示了评估模式的差异(美洲:7.97 vs欧洲:6.40)。疾病特异性分析显示,HepaChat在AIH(欧洲:7.12,美洲:8.17)和PSC管理方面表现优异(欧洲:6.89),而在美洲的AIH和PBC方面表现最佳(分别为8.17和8.37)。与传统LLMs相比,这三种系统都有显著改善(2023年基准:6.72±1.67)。结论:本评价表明,结合临床指南和安全方案的专门RAG系统可以显著提高对AILD管理的支持。评估中的地理差异突出了在人工智能系统开发中考虑区域临床观点的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of hepatology
Annals of hepatology 医学-胃肠肝病学
CiteScore
7.90
自引率
2.60%
发文量
183
审稿时长
4-8 weeks
期刊介绍: Annals of Hepatology publishes original research on the biology and diseases of the liver in both humans and experimental models. Contributions may be submitted as regular articles. The journal also publishes concise reviews of both basic and clinical topics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信