Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis.

Health data science Pub Date : 2024-12-04 eCollection Date: 2024-01-01 DOI:10.34133/hds.0196

Siqi Li, Di Miao, Qiming Wu, Chuan Hong, Danny D'Agostino, Xin Li, Yilin Ning, Yuqing Shang, Ziwen Wang, Molei Liu, Huazhu Fu, Marcus Eng Hock Ong, Hamed Haddadi, Nan Liu

{"title":"Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis.","authors":"Siqi Li, Di Miao, Qiming Wu, Chuan Hong, Danny D'Agostino, Xin Li, Yilin Ning, Yuqing Shang, Ziwen Wang, Molei Liu, Huazhu Fu, Marcus Eng Hock Ong, Hamed Haddadi, Nan Liu","doi":"10.34133/hds.0196","DOIUrl":null,"url":null,"abstract":"Background: Federated learning (FL) holds promise for safeguarding data privacy in healthcare collaborations. While the term \"FL\" was originally coined by the engineering community, the statistical field has also developed privacy-preserving algorithms, though these are less recognized. Our goal was to bridge this gap with the first comprehensive comparison of FL frameworks from both domains. Methods: We assessed 7 FL frameworks, encompassing both engineering-based and statistical FL algorithms, and compared them against local and centralized modeling of logistic regression and least absolute shrinkage and selection operator (Lasso). Our evaluation utilized both simulated data and real-world emergency department data, focusing on comparing both estimated model coefficients and the performance of model predictions. Results: The findings reveal that statistical FL algorithms produce much less biased estimates of model coefficients. Conversely, engineering-based methods can yield models with slightly better prediction performance, occasionally outperforming both centralized and statistical FL models. Conclusion: This study underscores the relative strengths and weaknesses of both types of methods, providing recommendations for their selection based on distinct study characteristics. Furthermore, we emphasize the critical need to raise awareness of and integrate these methods into future applications of FL within the healthcare domain.","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0196"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11615161/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Federated learning (FL) holds promise for safeguarding data privacy in healthcare collaborations. While the term "FL" was originally coined by the engineering community, the statistical field has also developed privacy-preserving algorithms, though these are less recognized. Our goal was to bridge this gap with the first comprehensive comparison of FL frameworks from both domains. Methods: We assessed 7 FL frameworks, encompassing both engineering-based and statistical FL algorithms, and compared them against local and centralized modeling of logistic regression and least absolute shrinkage and selection operator (Lasso). Our evaluation utilized both simulated data and real-world emergency department data, focusing on comparing both estimated model coefficients and the performance of model predictions. Results: The findings reveal that statistical FL algorithms produce much less biased estimates of model coefficients. Conversely, engineering-based methods can yield models with slightly better prediction performance, occasionally outperforming both centralized and statistical FL models. Conclusion: This study underscores the relative strengths and weaknesses of both types of methods, providing recommendations for their selection based on distinct study characteristics. Furthermore, we emphasize the critical need to raise awareness of and integrate these methods into future applications of FL within the healthcare domain.

查看原文本刊更多论文

医疗保健中的联邦学习：结构化数据分析的工程和统计方法的基准比较。

背景：联邦学习（FL）有望在医疗保健协作中保护数据隐私。虽然术语“FL”最初是由工程界创造的，但统计领域也开发了隐私保护算法，尽管这些算法不太为人所知。我们的目标是通过首次全面比较两个领域的FL框架来弥合这一差距。方法：我们评估了7个FL框架，包括基于工程和统计的FL算法，并将它们与逻辑回归的局部和集中建模以及最小绝对收缩和选择算子（Lasso）进行了比较。我们的评估利用了模拟数据和现实世界的急诊科数据，重点比较了估计的模型系数和模型预测的性能。结果：研究结果表明，统计FL算法产生的模型系数的偏差估计要小得多。相反，基于工程的方法可以产生稍微更好的预测性能的模型，偶尔优于集中式和统计FL模型。结论：本研究强调了这两种方法的相对优势和劣势，并根据不同的研究特征为其选择提供了建议。此外，我们强调迫切需要提高对这些方法的认识，并将这些方法集成到医疗保健领域FL的未来应用中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Health data science

CiteScore

3.70

自引率

0.00%

发文量