Evaluation of Care Quality for Atrial Fibrillation Across Non-Interoperable Electronic Health Record Data using a Retrieval-Augmented Generation-enabled Large Language Model.

medRxiv : the preprint server for health sciences Pub Date : 2025-09-24 DOI:10.1101/2024.09.19.24313992

Philip Adejumo, Phyllis M Thangaraj, Lovedeep S Dhingra, Dhruva Biswas, Arya Aminorroaya, Sumukh Vasisht Shankar, Aline F Pedroso, Philip M Croon, Rohan Khera

{"title":"Evaluation of Care Quality for Atrial Fibrillation Across Non-Interoperable Electronic Health Record Data using a Retrieval-Augmented Generation-enabled Large Language Model.","authors":"Philip Adejumo, Phyllis M Thangaraj, Lovedeep S Dhingra, Dhruva Biswas, Arya Aminorroaya, Sumukh Vasisht Shankar, Aline F Pedroso, Philip M Croon, Rohan Khera","doi":"10.1101/2024.09.19.24313992","DOIUrl":null,"url":null,"abstract":"Importance: Standardized assessment of clinical quality measures from electronic health records (EHRs) is challenging because information is fragmented across structured and unstructured data, and due to low interoperability across systems. Traditionally, extracting this information requires manual EHR abstraction, a time-consuming and expensive process that also limits real-time care quality improvement. Objective: To evaluate whether a data format-agnostic retrieval-augmented generation-enabled large language model (RAG-LLM) can accurately abstract clinical variables from heterogeneous structured and unstructured EHR data.Design setting and participants: Retrospective cross-sectional study assessing stroke and bleeding risk in patients with atrial fibrillation (AF) from two health systems. We developed a RAG-LLM model to extract CHA DS-VASc and HAS-BLED risk factors from tabular data and clinical documentation.The framework was validated on 300 expert-annotated patient records (200 from Yale New Haven Health System [YNHHS] and 100 from the Medical Information Mart for Intensive Care [MIMIC-IV]). The system was deployed on two large cohorts: 104,204 patients with AF from YNHHS (2013-2024) and 13,117 from MIMIC-IV (2008-2022). We compared anticoagulation recommendations derived from RAG-LLM with those based on traditional structured data abstraction.Exposures: Use of a RAG-LLM model to abstract stroke and bleeding risk factors from structured and unstructured EHR data.Main outcomes and measures: Accuracy of RAG-LLM-based risk factor abstraction against expert annotation. Secondary outcomes included efficiency, cross-cohort generalizability, and impact on anticoagulation eligibility based on risk stratification.Results: In the validation cohort (mean age 74.8 years, 42.7% female), RAG-LLM demonstrated superior performance across all metrics compared with structural data abstraction. For individual CHA DS-VASc components, accuracy ranged from 0.94-1.00 (YNHHS) and 0.89-1.00 (MIMIC-IV) versus 0.66-0.92 (YNHHS) and 0.44-0.97 (MIMIC-IV) for structured data, which was similar for HAS-BLED (0.94-1.00 and 0.89-1.00 vs 0.66-0.94 and 0.44-0.97). In the deployment study, among 3,207 patients classified as low/intermediate stroke risk with structured data, 62.1% (1,993) were reclassified as high risk with RAG-LLM and would become eligible for anticoagulation. Similarly, 5.5% of those classified as low bleeding risk by structured data were reclassified as high risk, substantially refining contraindication assessment.Conclusions: A multimodal RAG-LLM accurately abstracts clinical variables from structured and unstructured EHR data to improve stroke and bleeding risk assessments in patients with AF, enhancing identification of appropriate anticoagulation candidates.","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11451809/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv : the preprint server for health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.19.24313992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Importance: Standardized assessment of clinical quality measures from electronic health records (EHRs) is challenging because information is fragmented across structured and unstructured data, and due to low interoperability across systems. Traditionally, extracting this information requires manual EHR abstraction, a time-consuming and expensive process that also limits real-time care quality improvement. Objective: To evaluate whether a data format-agnostic retrieval-augmented generation-enabled large language model (RAG-LLM) can accurately abstract clinical variables from heterogeneous structured and unstructured EHR data.

Design setting and participants: Retrospective cross-sectional study assessing stroke and bleeding risk in patients with atrial fibrillation (AF) from two health systems. We developed a RAG-LLM model to extract CHA DS-VASc and HAS-BLED risk factors from tabular data and clinical documentation.The framework was validated on 300 expert-annotated patient records (200 from Yale New Haven Health System [YNHHS] and 100 from the Medical Information Mart for Intensive Care [MIMIC-IV]). The system was deployed on two large cohorts: 104,204 patients with AF from YNHHS (2013-2024) and 13,117 from MIMIC-IV (2008-2022). We compared anticoagulation recommendations derived from RAG-LLM with those based on traditional structured data abstraction.

Exposures: Use of a RAG-LLM model to abstract stroke and bleeding risk factors from structured and unstructured EHR data.

Main outcomes and measures: Accuracy of RAG-LLM-based risk factor abstraction against expert annotation. Secondary outcomes included efficiency, cross-cohort generalizability, and impact on anticoagulation eligibility based on risk stratification.

Results: In the validation cohort (mean age 74.8 years, 42.7% female), RAG-LLM demonstrated superior performance across all metrics compared with structural data abstraction. For individual CHA DS-VASc components, accuracy ranged from 0.94-1.00 (YNHHS) and 0.89-1.00 (MIMIC-IV) versus 0.66-0.92 (YNHHS) and 0.44-0.97 (MIMIC-IV) for structured data, which was similar for HAS-BLED (0.94-1.00 and 0.89-1.00 vs 0.66-0.94 and 0.44-0.97). In the deployment study, among 3,207 patients classified as low/intermediate stroke risk with structured data, 62.1% (1,993) were reclassified as high risk with RAG-LLM and would become eligible for anticoagulation. Similarly, 5.5% of those classified as low bleeding risk by structured data were reclassified as high risk, substantially refining contraindication assessment.

Conclusions: A multimodal RAG-LLM accurately abstracts clinical variables from structured and unstructured EHR data to improve stroke and bleeding risk assessments in patients with AF, enhancing identification of appropriate anticoagulation candidates.

Abstract Image

查看原文本刊更多论文

从心房颤动患者的非结构化临床笔记中提取 CHA₂DS₂-VASc 风险因素的检索增强生成。

背景：评估心房颤动（房颤）患者的卒中风险对于指导抗凝治疗至关重要。CHA₂DS₂-VASc是定义这种风险的广泛使用的评分，但目前的评估依赖于临床医生的手动计算或从结构化的电子病历数据元素中进行近似计算。非结构化的临床笔记包含丰富的信息，可以加强风险评估。我们开发并验证了一种检索增强生成（RAG）方法，可从房颤患者的非结构化笔记中提取 CHA₂DS₂-VASc 风险因素：我们采用与大型语言模型 Llama3.1 配对的 RAG 架构，从非结构化笔记中提取与 CHA₂DS₂-VASc 评分相关的特征。该模型在耶鲁大学纽黑文卫生系统（YNHHS）的 1000 份随机临床笔记（934 名房颤患者）上进行了部署。为建立金标准，2 名临床医生在随机的 200 份笔记子集中手动审核并标注了 CHA₂DS₂-VASc 风险因素。每个患者的 CHA₂DS₂-VASc 评分都是单独使用结构化数据并结合 RAG 识别出的风险因素计算得出的。我们使用接收者操作特征下的宏观平均面积 (AUROC) 来评估不同风险因素的性能。为了进行外部验证，我们使用了 MIMIC-IV 数据库中 100 份人工标注的临床笔记：结果：RAG 模型在从临床笔记中提取风险因素方面表现出色。在 1000 份临床笔记中，RAG 比结构化元素更频繁地识别出一些风险因素，包括高血压（82.4% vs 26.2%）、中风/TIA（62.9% vs 45.5%）、血管疾病（83.4% vs 56.6%）和糖尿病（84.1% vs 47.2%）。在 200 份专家注释的注释中，RAG 方法在各种风险因素方面都取得了很高的绩效，高血压、糖尿病和年龄≥75 岁的 AUROC 在 0.96 到 0.98 之间。与仅使用结构化数据相比，纳入 RAG 确定的风险因素可提高 CHA₂DS₂-VASc 评分：经过 LLM 优化的 RAG 可以从房颤患者的非结构化临床笔记中准确提取 CHA₂DS₂-VASc 风险因素。这种方法可实现可计算的风险评估，并指导适当的抗凝治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv : the preprint server for health sciences

自引率

0.00%

发文量