Privacy-preserving federated prediction of pain intensity change based on multi-center survey data

arXiv - CS - Machine Learning Pub Date : 2024-09-12 DOI:arxiv-2409.07997

Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach

{"title":"Privacy-preserving federated prediction of pain intensity change based on multi-center survey data","authors":"Supratim Das, Mahdie Rafie, Paula Kammer, Søren T. Skou, Dorte T. Grønne, Ewa M. Roos, André Hajek, Hans-Helmut König, Md Shihab Ullaha, Niklas Probul, Jan Baumbacha, Linda Baumbach","doi":"arxiv-2409.07997","DOIUrl":null,"url":null,"abstract":"Background: Patient-reported survey data are used to train prognostic models\naimed at improving healthcare. However, such data are typically available\nmulti-centric and, for privacy reasons, cannot easily be centralized in one\ndata repository. Models trained locally are less accurate, robust, and\ngeneralizable. We present and apply privacy-preserving federated machine\nlearning techniques for prognostic model building, where local survey data\nnever leaves the legally safe harbors of the medical centers. Methods: We used\ncentralized, local, and federated learning techniques on two healthcare\ndatasets (GLA:D data from the five health regions of Denmark and international\nSHARE data of 27 countries) to predict two different health outcomes. We\ncompared linear regression, random forest regression, and random forest\nclassification models trained on local data with those trained on the entire\ndata in a centralized and in a federated fashion. Results: In GLA:D data,\nfederated linear regression (R2 0.34, RMSE 18.2) and federated random forest\nregression (R2 0.34, RMSE 18.3) models outperform their local counterparts\n(i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance.\nWe also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5,\nrespectively) did not perform significantly better than the federated models.\nIn SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC\n0.84, AUROC: 0.66) perform significantly better than the local models (AC:\n0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of\nprognostic models from multi-center surveys without compromising privacy and\nwith only minimal or no compromise regarding model performance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07997","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Patient-reported survey data are used to train prognostic models aimed at improving healthcare. However, such data are typically available multi-centric and, for privacy reasons, cannot easily be centralized in one data repository. Models trained locally are less accurate, robust, and generalizable. We present and apply privacy-preserving federated machine learning techniques for prognostic model building, where local survey data never leaves the legally safe harbors of the medical centers. Methods: We used centralized, local, and federated learning techniques on two healthcare datasets (GLA:D data from the five health regions of Denmark and international SHARE data of 27 countries) to predict two different health outcomes. We compared linear regression, random forest regression, and random forest classification models trained on local data with those trained on the entire data in a centralized and in a federated fashion. Results: In GLA:D data, federated linear regression (R2 0.34, RMSE 18.2) and federated random forest regression (R2 0.34, RMSE 18.3) models outperform their local counterparts (i.e., R2 0.32, RMSE 18.6, R2 0.30, RMSE 18.8) with statistical significance. We also found that centralized models (R2 0.34, RMSE 18.2, R2 0.32, RMSE 18.5, respectively) did not perform significantly better than the federated models. In SHARE, the federated model (AC 0.78, AUROC: 0.71) and centralized model (AC 0.84, AUROC: 0.66) perform significantly better than the local models (AC: 0.74, AUROC: 0.69). Conclusion: Federated learning enables the training of prognostic models from multi-center surveys without compromising privacy and with only minimal or no compromise regarding model performance.

查看原文本刊更多论文

基于多中心调查数据的疼痛强度变化的隐私保护联合预测

背景：患者报告的调查数据被用来训练预后模型，以改善医疗保健。然而，此类数据通常是多中心提供的，出于隐私原因，无法轻易集中到一个数据存储库中。本地训练的模型准确性、鲁棒性和通用性都较差。我们提出并应用了保护隐私的联合机器学习技术来构建预后模型，其中本地调查数据永远不会离开医疗中心的合法安全港。方法：我们在两个健康数据集（来自丹麦五个健康地区的 GLA:D 数据和来自 27 个国家的国际医疗保健数据）上使用了集中、本地和联合学习技术来预测两种不同的健康结果。我们比较了在本地数据上训练的线性回归模型、随机森林回归模型和随机森林分类模型，以及以集中和联合方式在初始数据上训练的模型。结果显示在 GLA:D 数据中，联合线性回归模型（R2 0.34，RMSE 18.2）和联合随机森林回归模型（R2 0.34，RMSE 18.3）优于其本地对应模型（即：R2 0.32，RMSE 18.3）、我们还发现，集中模型（分别为 R2 0.34、RMSE 18.2、R2 0.32、RMSE 18.5）的表现并没有明显优于联合模型。在 SHARE 中，联合模型（AC 0.78，AUROC：0.71）和集中模型（AC 0.84，AUROC：0.66）的表现明显优于本地模型（AC：0.74，AUROC：0.69）。结论联合学习能在不损害隐私的情况下从多中心调查中训练预测模型，而且模型的性能只受到最低程度的影响，甚至没有受到任何影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Machine Learning

自引率

0.00%

发文量