Marie Analiz April Limpoco, Christel Faes, Niel Hens
{"title":"基于一次性共享汇总统计的联邦混合效应逻辑回归。","authors":"Marie Analiz April Limpoco, Christel Faes, Niel Hens","doi":"10.1002/bimj.70080","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Upholding data privacy, especially in medical research, has become tantamount to facing difficulties in accessing individual-level patient data. Estimating mixed effects binary logistic regression models involving data from multiple data providers, like hospitals, thus becomes more challenging. Federated learning has emerged as an option to preserve the privacy of individual observations while still estimating a global model that can be interpreted on the individual level, but it usually involves iterative communication between the data providers and the data analyst. In this paper, we present a strategy to estimate a mixed effects binary logistic regression model that requires data providers to share summary statistics only once. It involves generating pseudo-data whose summary statistics match those of the actual data and using these in the model estimation process instead of the actual unavailable data. Our strategy is able to include multiple predictors, which can be a combination of continuous and categorical variables. Through simulation, we show that our approach estimates the true model at least as good as the one that requires the pooled individual observations. An illustrative example using real data is provided. Unlike typical federated learning algorithms, our approach eliminates infrastructure requirements and security issues while being communication efficient and while accounting for heterogeneity.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 5","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Federated Mixed Effects Logistic Regression Based on One-Time Shared Summary Statistics\",\"authors\":\"Marie Analiz April Limpoco, Christel Faes, Niel Hens\",\"doi\":\"10.1002/bimj.70080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Upholding data privacy, especially in medical research, has become tantamount to facing difficulties in accessing individual-level patient data. Estimating mixed effects binary logistic regression models involving data from multiple data providers, like hospitals, thus becomes more challenging. Federated learning has emerged as an option to preserve the privacy of individual observations while still estimating a global model that can be interpreted on the individual level, but it usually involves iterative communication between the data providers and the data analyst. In this paper, we present a strategy to estimate a mixed effects binary logistic regression model that requires data providers to share summary statistics only once. It involves generating pseudo-data whose summary statistics match those of the actual data and using these in the model estimation process instead of the actual unavailable data. Our strategy is able to include multiple predictors, which can be a combination of continuous and categorical variables. Through simulation, we show that our approach estimates the true model at least as good as the one that requires the pooled individual observations. An illustrative example using real data is provided. Unlike typical federated learning algorithms, our approach eliminates infrastructure requirements and security issues while being communication efficient and while accounting for heterogeneity.</p></div>\",\"PeriodicalId\":55360,\"journal\":{\"name\":\"Biometrical Journal\",\"volume\":\"67 5\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrical Journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/bimj.70080\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrical Journal","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bimj.70080","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Federated Mixed Effects Logistic Regression Based on One-Time Shared Summary Statistics
Upholding data privacy, especially in medical research, has become tantamount to facing difficulties in accessing individual-level patient data. Estimating mixed effects binary logistic regression models involving data from multiple data providers, like hospitals, thus becomes more challenging. Federated learning has emerged as an option to preserve the privacy of individual observations while still estimating a global model that can be interpreted on the individual level, but it usually involves iterative communication between the data providers and the data analyst. In this paper, we present a strategy to estimate a mixed effects binary logistic regression model that requires data providers to share summary statistics only once. It involves generating pseudo-data whose summary statistics match those of the actual data and using these in the model estimation process instead of the actual unavailable data. Our strategy is able to include multiple predictors, which can be a combination of continuous and categorical variables. Through simulation, we show that our approach estimates the true model at least as good as the one that requires the pooled individual observations. An illustrative example using real data is provided. Unlike typical federated learning algorithms, our approach eliminates infrastructure requirements and security issues while being communication efficient and while accounting for heterogeneity.
期刊介绍:
Biometrical Journal publishes papers on statistical methods and their applications in life sciences including medicine, environmental sciences and agriculture. Methodological developments should be motivated by an interesting and relevant problem from these areas. Ideally the manuscript should include a description of the problem and a section detailing the application of the new methodology to the problem. Case studies, review articles and letters to the editors are also welcome. Papers containing only extensive mathematical theory are not suitable for publication in Biometrical Journal.