FedIMPUTE: Privacy-preserving missing value imputation for multi-site heterogeneous electronic health records

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics Pub Date : 2025-03-05 DOI:10.1016/j.jbi.2025.104780

Siqi Li , Mengying Yan , Ruizhi Yuan , Molei Liu , Nan Liu , Chuan Hong

{"title":"FedIMPUTE: Privacy-preserving missing value imputation for multi-site heterogeneous electronic health records","authors":"Siqi Li , Mengying Yan , Ruizhi Yuan , Molei Liu , Nan Liu , Chuan Hong","doi":"10.1016/j.jbi.2025.104780","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives:</h3><div>We propose FedIMPUTE, a communication-efficient federated learning (FL) based approach for missing value imputation (MVI). Our method enables multiple sites to collaboratively perform MVI in a privacy-preserving manner, addressing challenges of data-sharing constraints and population heterogeneity.</div></div><div><h3>Methods:</h3><div>We begin by conducting MVI locally at each participating site, followed by the application of various FL strategies, ranging from basic to advanced, to federate local MVI models without sharing site-specific data. The federated model is then broadcast and used by each site for MVI. We evaluate FedIMPUTE using both simulation studies and a real-world application on electronic health records (EHRs) to predict emergency department (ED) outcomes as a proof of concept.</div></div><div><h3>Results:</h3><div>Simulation studies show that FedIMPUTE outperforms all baseline MVI methods under comparison, improving downstream prediction performance and effectively handling data heterogeneity across sites. By using ED datasets from three hospitals within the Duke University Health System (DUHS), FedIMPUTE achieves the lowest mean squared error (MSE) among benchmark MVI methods, indicating superior imputation accuracy. Additionally, FedIMPUTE provides good downstream prediction performance, outperforming or matching other benchmark methods.</div></div><div><h3>Conclusion:</h3><div>FedIMPUTE enhances the performance of downstream risk prediction tasks, particularly for sites with high missing data rates and small sample sizes. It is easy to implement and communication-efficient, requiring sites to share only non-patient-level summary statistics.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"165 ","pages":"Article 104780"},"PeriodicalIF":4.0000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000097","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives:

We propose FedIMPUTE, a communication-efficient federated learning (FL) based approach for missing value imputation (MVI). Our method enables multiple sites to collaboratively perform MVI in a privacy-preserving manner, addressing challenges of data-sharing constraints and population heterogeneity.

Methods:

We begin by conducting MVI locally at each participating site, followed by the application of various FL strategies, ranging from basic to advanced, to federate local MVI models without sharing site-specific data. The federated model is then broadcast and used by each site for MVI. We evaluate FedIMPUTE using both simulation studies and a real-world application on electronic health records (EHRs) to predict emergency department (ED) outcomes as a proof of concept.

Results:

Simulation studies show that FedIMPUTE outperforms all baseline MVI methods under comparison, improving downstream prediction performance and effectively handling data heterogeneity across sites. By using ED datasets from three hospitals within the Duke University Health System (DUHS), FedIMPUTE achieves the lowest mean squared error (MSE) among benchmark MVI methods, indicating superior imputation accuracy. Additionally, FedIMPUTE provides good downstream prediction performance, outperforming or matching other benchmark methods.

Conclusion:

FedIMPUTE enhances the performance of downstream risk prediction tasks, particularly for sites with high missing data rates and small sample sizes. It is easy to implement and communication-efficient, requiring sites to share only non-patient-level summary statistics.

Abstract Image

查看原文本刊更多论文

FedIMPUTE：多站点异构电子健康记录的隐私保护缺失值输入。

目的：我们提出了FedIMPUTE，一种基于通信高效的联邦学习（FL）的缺失值估算（MVI）方法。我们的方法使多个站点能够以保护隐私的方式协同执行MVI，解决了数据共享约束和人口异质性的挑战。方法：我们首先在每个参与站点进行本地MVI，然后应用各种FL策略，从基本到高级，在不共享站点特定数据的情况下联合本地MVI模型。然后广播联邦模型，供每个站点用于MVI。我们使用模拟研究和电子健康记录（EHRs）的实际应用来评估FedIMPUTE，以预测急诊科（ED）的结果，作为概念验证。结果：仿真研究表明，FedIMPUTE优于所有基线MVI方法，提高了下游预测性能，有效地处理了跨站点的数据异质性。通过使用杜克大学卫生系统（DUHS）内三家医院的ED数据集，FedIMPUTE在基准MVI方法中实现了最低的均方误差（MSE），表明了优越的imputation精度。此外，FedIMPUTE提供了良好的下游预测性能，优于或匹配其他基准测试方法。结论：FedIMPUTE提高了下游风险预测任务的性能，特别是对于数据缺失率高和样本量小的站点。它易于实现且通信效率高，只需要站点共享非患者级别的汇总统计数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.