A self-supervised framework for laboratory data imputation in electronic health records.

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

Communications medicine Pub Date : 2025-07-01 DOI:10.1038/s43856-025-00973-w

Samuel P Heilbroner, Curtis Carter, David M Vidmar, Erik T Mueller, Martin C Stumpe, Riccardo Miotto

{"title":"A self-supervised framework for laboratory data imputation in electronic health records.","authors":"Samuel P Heilbroner, Curtis Carter, David M Vidmar, Erik T Mueller, Martin C Stumpe, Riccardo Miotto","doi":"10.1038/s43856-025-00973-w","DOIUrl":null,"url":null,"abstract":"Background: Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to high levels of missingness. Existing imputation methods fall short, as they do not fully leverage patient clinical histories and are commonly not scalable to the large number of tests available in real-world data (RWD).Methods: To address these shortcomings, we present Laboratory Imputation Framework for EHRs (LIFE), a self-supervised learning framework based on multi-head attention that is trained to impute any laboratory test value at any point in time in the patient's journey using their complete EHRs. This architecture (1) eliminates the need to train a different model for each laboratory test by jointly modeling all laboratory data of interest; and (2) better clinically contextualizes the predictions by leveraging additional EHR variables, such as diagnosis, medications, and discrete laboratory results.Results: We validate our framework using a large-scale, real-world dataset encompassing over 1 million oncology patients. Our results demonstrate that LIFE obtains superior or equivalent results compared to state-of-the-art baseline methods in 23 out of 25 evaluated laboratory tests and better enhances a downstream adverse event detection task in 7 out of 9 cases.Conclusions: LIFE shows promise in accurately estimating missing laboratory values and enhancing the utilization of large-scale RWD in healthcare. This advancement could lead to better clinical models, more informed decision-making and improved patient outcomes.","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"251"},"PeriodicalIF":5.4000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12216283/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-00973-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to high levels of missingness. Existing imputation methods fall short, as they do not fully leverage patient clinical histories and are commonly not scalable to the large number of tests available in real-world data (RWD).

Methods: To address these shortcomings, we present Laboratory Imputation Framework for EHRs (LIFE), a self-supervised learning framework based on multi-head attention that is trained to impute any laboratory test value at any point in time in the patient's journey using their complete EHRs. This architecture (1) eliminates the need to train a different model for each laboratory test by jointly modeling all laboratory data of interest; and (2) better clinically contextualizes the predictions by leveraging additional EHR variables, such as diagnosis, medications, and discrete laboratory results.

Results: We validate our framework using a large-scale, real-world dataset encompassing over 1 million oncology patients. Our results demonstrate that LIFE obtains superior or equivalent results compared to state-of-the-art baseline methods in 23 out of 25 evaluated laboratory tests and better enhances a downstream adverse event detection task in 7 out of 9 cases.

Conclusions: LIFE shows promise in accurately estimating missing laboratory values and enhancing the utilization of large-scale RWD in healthcare. This advancement could lead to better clinical models, more informed decision-making and improved patient outcomes.

查看原文本刊更多论文

电子健康记录中实验室数据输入的自我监督框架。

背景：电子健康记录（EHRs）中的实验室数据是描述患者群体特征、为准确诊断和治疗决策提供信息以及推动研究的有效信息来源。然而，尽管它们的价值，由于高水平的缺失，实验室价值未得到充分利用。现有的归算方法存在不足，因为它们不能充分利用患者的临床病史，并且通常不能扩展到现实世界数据（RWD）中可用的大量测试。方法：为了解决这些缺点，我们提出了电子病历的实验室计算框架（LIFE），这是一个基于多头注意力的自我监督学习框架，经过训练，可以使用患者完整的电子病历在患者旅程的任何时间点计算任何实验室测试值。该架构(1)通过联合建模所有感兴趣的实验室数据，消除了为每个实验室测试训练不同模型的需要；(2)通过利用额外的EHR变量（如诊断、药物和离散的实验室结果），更好地在临床背景下进行预测。结果：我们使用包含超过100万肿瘤患者的大规模真实数据集验证了我们的框架。我们的研究结果表明，与最先进的基线方法相比，LIFE在25项评估的实验室测试中有23项获得了更好或同等的结果，并且在9例中有7例更好地增强了下游不良事件检测任务。结论：LIFE在准确估计缺失的实验室值和提高大规模RWD在医疗保健中的应用方面显示出前景。这一进步可能会导致更好的临床模型，更明智的决策和改善患者的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Communications medicine

自引率

0.00%

发文量