Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study.

JMIR AI Pub Date : 2025-02-24 DOI:10.2196/58670

Yanjun Gao, Ruizhe Li, Emma Croxford, John Caskey, Brian W Patterson, Matthew Churpek, Timothy Miller, Dmitriy Dligach, Majid Afshar

{"title":"Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study.","authors":"Yanjun Gao, Ruizhe Li, Emma Croxford, John Caskey, Brian W Patterson, Matthew Churpek, Timothy Miller, Dmitriy Dligach, Majid Afshar","doi":"10.2196/58670","DOIUrl":null,"url":null,"abstract":"Background: Electronic health records (EHRs) and routine documentation practices play a vital role in patients' daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives can overwhelm health care providers, increasing the risk of diagnostic inaccuracies. While large language models (LLMs) have showcased their potential in diverse language tasks, their application in health care must prioritize the minimization of diagnostic errors and the prevention of patient harm. Integrating knowledge graphs (KGs) into LLMs offers a promising approach because structured knowledge from KGs could enhance LLMs' diagnostic reasoning by providing contextually relevant medical information.Objective: This study introduces DR.KNOWS (Diagnostic Reasoning Knowledge Graph System), a model that integrates Unified Medical Language System-based KGs with LLMs to improve diagnostic predictions from EHR data by retrieving contextually relevant paths aligned with patient-specific information.Methods: DR.KNOWS combines a stack graph isomorphism network for node embedding with an attention-based path ranker to identify and rank knowledge paths relevant to a patient's clinical context. We evaluated DR.KNOWS on 2 real-world EHR datasets from different geographic locations, comparing its performance to baseline models, including QuickUMLS and standard LLMs (Text-to-Text Transfer Transformer and ChatGPT). To assess diagnostic reasoning quality, we designed and implemented a human evaluation framework grounded in clinical safety metrics.Results: DR.KNOWS demonstrated notable improvements over baseline models, showing higher accuracy in extracting diagnostic concepts and enhanced diagnostic prediction metrics. Prompt-based fine-tuning of Text-to-Text Transfer Transformer with DR.KNOWS knowledge paths achieved the highest ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence) and concept unique identifier F1-scores, highlighting the benefits of KG integration. Human evaluators found the diagnostic rationales of DR.KNOWS to be aligned strongly with correct clinical reasoning, indicating improved abstraction and reasoning. Recognized limitations include potential biases within the KG data, which we addressed by emphasizing case-specific path selection and proposing future bias-mitigation strategies.Conclusions: DR.KNOWS offers a robust approach for enhancing diagnostic accuracy and reasoning by integrating structured KG knowledge into LLM-based clinical workflows. Although further work is required to address KG biases and extend generalizability, DR.KNOWS represents progress toward trustworthy artificial intelligence-driven clinical decision support, with a human evaluation framework focused on diagnostic safety and alignment with clinical standards.","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e58670"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894347/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/58670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Electronic health records (EHRs) and routine documentation practices play a vital role in patients' daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives can overwhelm health care providers, increasing the risk of diagnostic inaccuracies. While large language models (LLMs) have showcased their potential in diverse language tasks, their application in health care must prioritize the minimization of diagnostic errors and the prevention of patient harm. Integrating knowledge graphs (KGs) into LLMs offers a promising approach because structured knowledge from KGs could enhance LLMs' diagnostic reasoning by providing contextually relevant medical information.

Objective: This study introduces DR.KNOWS (Diagnostic Reasoning Knowledge Graph System), a model that integrates Unified Medical Language System-based KGs with LLMs to improve diagnostic predictions from EHR data by retrieving contextually relevant paths aligned with patient-specific information.

Methods: DR.KNOWS combines a stack graph isomorphism network for node embedding with an attention-based path ranker to identify and rank knowledge paths relevant to a patient's clinical context. We evaluated DR.KNOWS on 2 real-world EHR datasets from different geographic locations, comparing its performance to baseline models, including QuickUMLS and standard LLMs (Text-to-Text Transfer Transformer and ChatGPT). To assess diagnostic reasoning quality, we designed and implemented a human evaluation framework grounded in clinical safety metrics.

Results: DR.KNOWS demonstrated notable improvements over baseline models, showing higher accuracy in extracting diagnostic concepts and enhanced diagnostic prediction metrics. Prompt-based fine-tuning of Text-to-Text Transfer Transformer with DR.KNOWS knowledge paths achieved the highest ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence) and concept unique identifier F₁-scores, highlighting the benefits of KG integration. Human evaluators found the diagnostic rationales of DR.KNOWS to be aligned strongly with correct clinical reasoning, indicating improved abstraction and reasoning. Recognized limitations include potential biases within the KG data, which we addressed by emphasizing case-specific path selection and proposing future bias-mitigation strategies.

Conclusions: DR.KNOWS offers a robust approach for enhancing diagnostic accuracy and reasoning by integrating structured KG knowledge into LLM-based clinical workflows. Although further work is required to address KG biases and extend generalizability, DR.KNOWS represents progress toward trustworthy artificial intelligence-driven clinical decision support, with a human evaluation framework focused on diagnostic safety and alignment with clinical standards.

查看原文本刊更多论文

利用医学知识图作为诊断预测的大型语言模型：设计和应用研究。

背景：电子健康记录（EHRs）和常规文件实践在患者的日常护理中起着至关重要的作用，提供了健康，诊断和治疗的整体记录。然而，复杂和冗长的电子病历叙述会使卫生保健提供者不堪重负，增加诊断不准确的风险。虽然大型语言模型（llm）已经在各种语言任务中展示了它们的潜力，但它们在医疗保健中的应用必须优先考虑最大限度地减少诊断错误和预防患者伤害。将知识图（KGs）集成到法学硕士中提供了一种很有前途的方法，因为知识图中的结构化知识可以通过提供与上下文相关的医学信息来增强法学硕士的诊断推理。目的：本研究引入了DR.KNOWS（诊断推理知识图谱系统），这是一个将基于统一医学语言系统的KGs与llm集成在一起的模型，通过检索与患者特定信息相一致的上下文相关路径，从电子病历数据中改进诊断预测。方法：DR.KNOWS将用于节点嵌入的堆栈图同构网络与基于注意力的路径排序器相结合，对与患者临床背景相关的知识路径进行识别和排序。我们在两个来自不同地理位置的真实EHR数据集上对DR.KNOWS进行了评估，并将其性能与基线模型进行了比较，包括QuickUMLS和标准llm（文本到文本传输转换器和ChatGPT）。为了评估诊断推理的质量，我们设计并实施了一个基于临床安全指标的人类评估框架。结果：DR.KNOWS比基线模型有了显著的改进，在提取诊断概念和增强诊断预测指标方面显示出更高的准确性。基于提示的基于DR.KNOWS知识路径的文本到文本传输转换器的微调获得了最高的ROUGE-L（面向记忆的注册评估最长公共子序列替代研究）和概念唯一标识符f1分数，突出了KG集成的好处。人类评估人员发现DR.KNOWS的诊断原理与正确的临床推理高度一致，表明抽象和推理能力得到了提高。公认的局限性包括KG数据中的潜在偏差，我们通过强调具体病例的路径选择和提出未来的偏差缓解策略来解决这一问题。结论：DR.KNOWS通过将结构化KG知识整合到基于法学硕士的临床工作流程中，为提高诊断准确性和推理能力提供了一种强大的方法。虽然需要进一步的工作来解决KG偏差和扩展普遍性，但DR.KNOWS代表了值得信赖的人工智能驱动的临床决策支持的进展，其人类评估框架侧重于诊断安全性和与临床标准的一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR AI

自引率

0.00%

发文量