Development of time to event prediction models using federated learning.

IF 3.9 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES

BMC Medical Research Methodology Pub Date : 2025-05-26 DOI:10.1186/s12874-025-02598-y

Rasmus Rask Kragh Jørgensen, Jonas Faartoft Jensen, Tarec El-Galaly, Martin Bøgsted, Rasmus Froberg Brøndum, Mikkel Runason Simonsen, Lasse Hjort Jakobsen

{"title":"Development of time to event prediction models using federated learning.","authors":"Rasmus Rask Kragh Jørgensen, Jonas Faartoft Jensen, Tarec El-Galaly, Martin Bøgsted, Rasmus Froberg Brøndum, Mikkel Runason Simonsen, Lasse Hjort Jakobsen","doi":"10.1186/s12874-025-02598-y","DOIUrl":null,"url":null,"abstract":"Background: In a wide range of diseases, it is necessary to utilize multiple data sources to obtain enough data for model training. However, performing centralized pooling of multiple data sources, while protecting each patients' sensitive data, can require a cumbersome process involving many institutional bodies. Alternatively, federated learning (FL) can be utilized to train models based on data located at multiple sites.Method: We propose two methods for training time-to-event prediction models based on distributed data, relying on FL algorithms, for time-to-event prediction models. Both approach incorporates steps to allow prediction of individual-level survival curves, without exposing individual-level event times. For Cox proportional hazards models, the latter is accomplished by using a kernel smoother for the baseline hazard function. The other proposed methodology is based on general parametric likelihood theory for right-censored data. We compared these two methods in four simulation and with one real-world dataset predicting the survival probability in patients with Hodgkin lymphoma (HL).Results: The simulations demonstrated that the FL models performed similarly to the non-distributed case in all four experiments, with only slight deviations in predicted survival probabilities compared to the true model. Our findings were similar in the real-world advanced-stage HL example where the FL models were compared to their non-distributed versions, revealing only small deviations in performance.Conclusion: The proposed procedures enable training of time-to-event models using data distributed across sites, without direct sharing of individual-level data and event times, while retaining a predictive performance on par with undistributed approaches.","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"143"},"PeriodicalIF":3.9000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12105200/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-025-02598-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: In a wide range of diseases, it is necessary to utilize multiple data sources to obtain enough data for model training. However, performing centralized pooling of multiple data sources, while protecting each patients' sensitive data, can require a cumbersome process involving many institutional bodies. Alternatively, federated learning (FL) can be utilized to train models based on data located at multiple sites.

Method: We propose two methods for training time-to-event prediction models based on distributed data, relying on FL algorithms, for time-to-event prediction models. Both approach incorporates steps to allow prediction of individual-level survival curves, without exposing individual-level event times. For Cox proportional hazards models, the latter is accomplished by using a kernel smoother for the baseline hazard function. The other proposed methodology is based on general parametric likelihood theory for right-censored data. We compared these two methods in four simulation and with one real-world dataset predicting the survival probability in patients with Hodgkin lymphoma (HL).

Results: The simulations demonstrated that the FL models performed similarly to the non-distributed case in all four experiments, with only slight deviations in predicted survival probabilities compared to the true model. Our findings were similar in the real-world advanced-stage HL example where the FL models were compared to their non-distributed versions, revealing only small deviations in performance.

Conclusion: The proposed procedures enable training of time-to-event models using data distributed across sites, without direct sharing of individual-level data and event times, while retaining a predictive performance on par with undistributed approaches.

查看原文本刊更多论文

使用联邦学习开发时间到事件的预测模型。

背景：在广泛的疾病中，需要利用多个数据源来获得足够的数据进行模型训练。然而，在保护每个患者敏感数据的同时，对多个数据源进行集中池化可能需要一个涉及许多机构的繁琐过程。或者，可以使用联邦学习（FL）来基于位于多个站点的数据训练模型。方法：我们提出了两种基于分布式数据训练时间到事件预测模型的方法，依赖于时间到事件预测模型的FL算法。这两种方法都包含了预测个体水平生存曲线的步骤，而不暴露个体水平的事件时间。对于Cox比例风险模型，后者是通过对基线风险函数使用核平滑来实现的。另一种提出的方法是基于右截尾数据的一般参数似然理论。我们在四个模拟和一个真实世界数据集中比较了这两种方法，预测霍奇金淋巴瘤（HL）患者的生存率。结果：模拟表明，在所有四个实验中，FL模型的表现与非分布情况相似，与真实模型相比，预测生存概率只有轻微偏差。我们的发现在现实世界的晚期HL示例中是相似的，将FL模型与其非分布式版本进行比较，只显示出性能上的小偏差。结论：所提出的程序可以使用分布在各个站点的数据来训练时间到事件模型，而无需直接共享个人层面的数据和事件时间，同时保留与非分布方法相当的预测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Research Methodology 医学-卫生保健

CiteScore

6.50

自引率

2.50%

发文量

298

审稿时长

3-8 weeks

期刊介绍： BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.