PED-DATA: A Privacy-Preserving Framework for Data-Driven, Pediatric Multi-Center Studies.

Studies in health technology and informatics Pub Date : 2025-09-03 DOI:10.3233/SHTI251409

Gorkem Yilmaz, Jonathan M Mang, Markus Metzler, Hans-Ulrich Prokosch, Manfred Rauh, Jakob Zierk

{"title":"PED-DATA: A Privacy-Preserving Framework for Data-Driven, Pediatric Multi-Center Studies.","authors":"Gorkem Yilmaz, Jonathan M Mang, Markus Metzler, Hans-Ulrich Prokosch, Manfred Rauh, Jakob Zierk","doi":"10.3233/SHTI251409","DOIUrl":null,"url":null,"abstract":"Introduction: Data-driven analysis of clinical databases is an efficient method for clinical knowledge generation, which is especially suitable when exceptional ethical and practical restrictions apply, such as in pediatrics. In the multi-center PEDREF 2.0 study, we are analyzing children's laboratory test results, diagnoses, and procedures from more than 20 German tertiary care centers to establish pediatric reference intervals. The PEDREF 2.0 study uses the framework of the German Medical Informatics Initiative, but the specific study needs require the development of a customized module for distributed pediatric analyses.Methods: We developed the Pediatric Distributed Analysis, Anonymization, and Aggregation Module (PED-DATA), which is a containerized application that we deployed to all participating centers. PED-DATA transforms the input datasets to a harmonized internal representation and enables their decentralized analysis in compliance with data protection rules, resulting in an anonymous output dataset that is transferred for central analysis.Results: In a preliminary analysis of data from 15 centers, we analyzed 52,807,236 laboratory test results from 753,774 different patients (323,943 to 4,338,317 test results per laboratory test), enabling us to establish pediatric reference intervals with previously unmatched precision.Conclusion: PED-DATA facilitates the implementation of pediatric data-driven multicenter studies in a decentralized and privacy-respecting manner, and its use throughout German University Hospitals in the PEDREF 2.0 study demonstrates its usefulness in a real-world use case.","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"307-317"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Data-driven analysis of clinical databases is an efficient method for clinical knowledge generation, which is especially suitable when exceptional ethical and practical restrictions apply, such as in pediatrics. In the multi-center PEDREF 2.0 study, we are analyzing children's laboratory test results, diagnoses, and procedures from more than 20 German tertiary care centers to establish pediatric reference intervals. The PEDREF 2.0 study uses the framework of the German Medical Informatics Initiative, but the specific study needs require the development of a customized module for distributed pediatric analyses.

Methods: We developed the Pediatric Distributed Analysis, Anonymization, and Aggregation Module (PED-DATA), which is a containerized application that we deployed to all participating centers. PED-DATA transforms the input datasets to a harmonized internal representation and enables their decentralized analysis in compliance with data protection rules, resulting in an anonymous output dataset that is transferred for central analysis.

Results: In a preliminary analysis of data from 15 centers, we analyzed 52,807,236 laboratory test results from 753,774 different patients (323,943 to 4,338,317 test results per laboratory test), enabling us to establish pediatric reference intervals with previously unmatched precision.

Conclusion: PED-DATA facilitates the implementation of pediatric data-driven multicenter studies in a decentralized and privacy-respecting manner, and its use throughout German University Hospitals in the PEDREF 2.0 study demonstrates its usefulness in a real-world use case.

查看原文本刊更多论文

PED-DATA：数据驱动的隐私保护框架，儿科多中心研究。

临床数据库的数据驱动分析是临床知识生成的一种有效方法，尤其适用于特殊的伦理和实践限制，如儿科。在多中心PEDREF 2.0研究中，我们分析了来自20多家德国三级医疗中心的儿童实验室检测结果、诊断和程序，以建立儿童参考区间。PEDREF 2.0研究使用了德国医学信息学倡议的框架，但具体的研究需要为分布式儿科分析开发一个定制的模块。方法：我们开发了儿科分布式分析、匿名化和聚合模块（PED-DATA），这是一个容器化的应用程序，我们部署到所有参与的中心。PED-DATA将输入数据集转换为统一的内部表示，并使其能够按照数据保护规则进行分散分析，从而产生用于集中分析的匿名输出数据集。结果：在15个中心的初步数据分析中，我们分析了来自753,774名不同患者的52,807,236个实验室检查结果（每个实验室检查结果为323,943至4,338,317），使我们能够以前所未有的精度建立儿科参考区间。结论：PED-DATA以分散和尊重隐私的方式促进了儿科数据驱动的多中心研究的实施，在PEDREF 2.0研究中，它在德国大学医院的使用证明了它在现实世界用例中的有用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Studies in health technology and informatics

自引率

0.00%

发文量