Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2024-07-19 DOI:10.2196/53622

Félix Camirand Lemyre, Simon Lévesque, Marie-Pier Domingue, Klaus Herrmann, Jean-François Ethier

{"title":"Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics.","authors":"Félix Camirand Lemyre, Simon Lévesque, Marie-Pier Domingue, Klaus Herrmann, Jean-François Ethier","doi":"10.2196/53622","DOIUrl":null,"url":null,"abstract":"Background: Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks.Objective: This paper aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data; (2) describing the methods applicable to generalized linear models (GLM) and assessing their underlying distributional assumptions; (3) adapting existing methods to make them fully usable in health settings.Methods: A scoping review methodology was employed for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and to derive the properties of the resulting estimators.Results: From the review, 41 articles were selected, and six approaches were extracted for conducting standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information-sharing requirements and operational complexity.Conclusions: This paper contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data, by adapting these methods to the context of heterogeneous health data and by clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/53622","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks.

Objective: This paper aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data; (2) describing the methods applicable to generalized linear models (GLM) and assessing their underlying distributional assumptions; (3) adapting existing methods to make them fully usable in health settings.

Methods: A scoping review methodology was employed for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and to derive the properties of the resulting estimators.

Results: From the review, 41 articles were selected, and six approaches were extracted for conducting standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information-sharing requirements and operational complexity.

Conclusions: This paper contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data, by adapting these methods to the context of heterogeneous health data and by clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.

查看原文本刊更多论文

分布式统计分析：范围审查和适用于健康分析的操作框架实例》。

背景：来自多个组织的数据对于推进学习型医疗系统至关重要。然而，伦理、法律和社会问题可能会限制使用依赖于数据汇集的标准统计方法。尽管分布式算法提供了替代方案，但它们并不总是适合健康框架：本文旨在从三个方面为研究人员和数据保管人员提供支持：（1）提供有关横向分割数据统计推断方法的文献概览；（2）描述适用于广义线性模型（GLM）的方法并评估其基本分布假设；（3）调整现有方法，使其完全适用于卫生环境：方法：采用范围综述的方法绘制文献图谱，从中发现并从卫生环境适用性的角度评估了为横向分割数据的广义线性模型分析提供方法框架的方法。统计理论被用来调整方法和推导所产生的估计器的特性：从综述中筛选出 41 篇文章，并提取出六种方法用于进行基于 GLM 的标准统计分析。然而，这些方法都假定各节点的数据分布均匀且相同。因此，为了适应节点样本大小不均和节点间数据分布不均的情况，我们推导出了统计程序。还开发了工作流程和详细算法，以突出信息共享要求和操作复杂性：本文概述了可用于水平分割数据的方法，将这些方法调整到异构健康数据的环境中，并阐明了所讨论方法的工作流程和交换的数量，从而为健康分析领域做出了贡献。需要进一步分析这些方法的保密性，以充分了解与共享汇总统计数据相关的风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.