基于一次性共享汇总统计的联邦混合效应逻辑回归。

IF 1.8 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Marie Analiz April Limpoco, Christel Faes, Niel Hens
{"title":"基于一次性共享汇总统计的联邦混合效应逻辑回归。","authors":"Marie Analiz April Limpoco,&nbsp;Christel Faes,&nbsp;Niel Hens","doi":"10.1002/bimj.70080","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Upholding data privacy, especially in medical research, has become tantamount to facing difficulties in accessing individual-level patient data. Estimating mixed effects binary logistic regression models involving data from multiple data providers, like hospitals, thus becomes more challenging. Federated learning has emerged as an option to preserve the privacy of individual observations while still estimating a global model that can be interpreted on the individual level, but it usually involves iterative communication between the data providers and the data analyst. In this paper, we present a strategy to estimate a mixed effects binary logistic regression model that requires data providers to share summary statistics only once. It involves generating pseudo-data whose summary statistics match those of the actual data and using these in the model estimation process instead of the actual unavailable data. Our strategy is able to include multiple predictors, which can be a combination of continuous and categorical variables. Through simulation, we show that our approach estimates the true model at least as good as the one that requires the pooled individual observations. An illustrative example using real data is provided. Unlike typical federated learning algorithms, our approach eliminates infrastructure requirements and security issues while being communication efficient and while accounting for heterogeneity.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 5","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federated Mixed Effects Logistic Regression Based on One-Time Shared Summary Statistics\",\"authors\":\"Marie Analiz April Limpoco,&nbsp;Christel Faes,&nbsp;Niel Hens\",\"doi\":\"10.1002/bimj.70080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Upholding data privacy, especially in medical research, has become tantamount to facing difficulties in accessing individual-level patient data. Estimating mixed effects binary logistic regression models involving data from multiple data providers, like hospitals, thus becomes more challenging. Federated learning has emerged as an option to preserve the privacy of individual observations while still estimating a global model that can be interpreted on the individual level, but it usually involves iterative communication between the data providers and the data analyst. In this paper, we present a strategy to estimate a mixed effects binary logistic regression model that requires data providers to share summary statistics only once. It involves generating pseudo-data whose summary statistics match those of the actual data and using these in the model estimation process instead of the actual unavailable data. Our strategy is able to include multiple predictors, which can be a combination of continuous and categorical variables. Through simulation, we show that our approach estimates the true model at least as good as the one that requires the pooled individual observations. An illustrative example using real data is provided. Unlike typical federated learning algorithms, our approach eliminates infrastructure requirements and security issues while being communication efficient and while accounting for heterogeneity.</p></div>\",\"PeriodicalId\":55360,\"journal\":{\"name\":\"Biometrical Journal\",\"volume\":\"67 5\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrical Journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/bimj.70080\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrical Journal","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bimj.70080","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

维护数据隐私,特别是在医学研究方面,已经等同于在获取个人层面的患者数据方面面临困难。因此,估计涉及多个数据提供者(如医院)数据的混合效应二元逻辑回归模型变得更具挑战性。联邦学习作为一种保护个人观察的隐私的选择而出现,同时仍然估计可以在个人级别上解释的全局模型,但它通常涉及数据提供者和数据分析师之间的迭代通信。在本文中,我们提出了一种估计混合效应二元逻辑回归模型的策略,该模型要求数据提供者只共享一次汇总统计数据。它涉及生成与实际数据的汇总统计相匹配的伪数据,并在模型估计过程中使用这些伪数据,而不是实际的不可用数据。我们的策略能够包含多个预测因子,这些预测因子可以是连续变量和分类变量的组合。通过模拟,我们表明我们的方法估计真实模型至少与需要汇集个人观测的模型一样好。给出了一个使用实际数据的示例。与典型的联邦学习算法不同,我们的方法消除了基础设施需求和安全问题,同时保证了通信效率和异构性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Federated Mixed Effects Logistic Regression Based on One-Time Shared Summary Statistics

Federated Mixed Effects Logistic Regression Based on One-Time Shared Summary Statistics

Upholding data privacy, especially in medical research, has become tantamount to facing difficulties in accessing individual-level patient data. Estimating mixed effects binary logistic regression models involving data from multiple data providers, like hospitals, thus becomes more challenging. Federated learning has emerged as an option to preserve the privacy of individual observations while still estimating a global model that can be interpreted on the individual level, but it usually involves iterative communication between the data providers and the data analyst. In this paper, we present a strategy to estimate a mixed effects binary logistic regression model that requires data providers to share summary statistics only once. It involves generating pseudo-data whose summary statistics match those of the actual data and using these in the model estimation process instead of the actual unavailable data. Our strategy is able to include multiple predictors, which can be a combination of continuous and categorical variables. Through simulation, we show that our approach estimates the true model at least as good as the one that requires the pooled individual observations. An illustrative example using real data is provided. Unlike typical federated learning algorithms, our approach eliminates infrastructure requirements and security issues while being communication efficient and while accounting for heterogeneity.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biometrical Journal
Biometrical Journal 生物-数学与计算生物学
CiteScore
3.20
自引率
5.90%
发文量
119
审稿时长
6-12 weeks
期刊介绍: Biometrical Journal publishes papers on statistical methods and their applications in life sciences including medicine, environmental sciences and agriculture. Methodological developments should be motivated by an interesting and relevant problem from these areas. Ideally the manuscript should include a description of the problem and a section detailing the application of the new methodology to the problem. Case studies, review articles and letters to the editors are also welcome. Papers containing only extensive mathematical theory are not suitable for publication in Biometrical Journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信