Federated Analysis With Differential Privacy in Oncology Research: Longitudinal Observational Study Across Hospital Data Warehouses.

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS
Théo Ryffel, Perrine Créquit, Maëlle Baillet, Jason Paumier, Yasmine Marfoq, Olivier Girardot, Thierry Chanet, Ronan Sy, Louise Bayssat, Julien Mazières, Vincent Vuiblet, Julien Ancel, Maxime Dewolf, François Margraff, Camille Bachot, Jacek Chmiel
{"title":"Federated Analysis With Differential Privacy in Oncology Research: Longitudinal Observational Study Across Hospital Data Warehouses.","authors":"Théo Ryffel, Perrine Créquit, Maëlle Baillet, Jason Paumier, Yasmine Marfoq, Olivier Girardot, Thierry Chanet, Ronan Sy, Louise Bayssat, Julien Mazières, Vincent Vuiblet, Julien Ancel, Maxime Dewolf, François Margraff, Camille Bachot, Jacek Chmiel","doi":"10.2196/59685","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Federated analytics in health care allows researchers to perform statistical queries on remote datasets without access to the raw data. This method arose from the need to perform statistical analysis on larger datasets collected at multiple health care centers while avoiding regulatory, governance, and privacy issues that might arise if raw data were collected at a central location outside the health care centers. Despite some pioneering work, federated analytics is still not widely used on real-world data, and to our knowledge, no real-world study has yet combined it with other privacy-enhancing techniques such as differential privacy (DP).</p><p><strong>Objective: </strong>The first objective of this study was to deploy a federated architecture in a real-world setting. The oncology study used for this deployment compared the medical health care management of patients with metastatic non-small cell lung cancer before and after the first wave of COVID-19 pandemic. The second goal was to test DP in this real-world scenario to assess its practicality and use as a privacy-enhancing technology.</p><p><strong>Methods: </strong>A federated architecture platform was set up in the Toulouse, Reims, and Foch centers. After harmonization of the data in each center, statistical analyses were performed using DataSHIELD (Data aggregation through anonymous summary-statistics from harmonized individual-level databases), a federated analysis R library, and a new open-source DP DataSHIELD package was implemented (dsPrivacy).</p><p><strong>Results: </strong>A total of 50 patients were enrolled in the Toulouse and Reims centers and 49 in the Foch center. We have shown that DataSHIELD is a practical tool to efficiently conduct our study across all 3 centers without exposing data on a central node, once a sufficient setup has been established to configure a secure network between hospitals. All planned aggregated results were successfully generated. We also observed that DP can be implemented in practice with promising trade-offs between privacy and accuracy, and we built a library that will prove useful for future work.</p><p><strong>Conclusions: </strong>The federated architecture platform made it possible to run a multicenter study on real-world oncology data while ensuring strong privacy guarantees using differential privacy.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e59685"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312987/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/59685","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Federated analytics in health care allows researchers to perform statistical queries on remote datasets without access to the raw data. This method arose from the need to perform statistical analysis on larger datasets collected at multiple health care centers while avoiding regulatory, governance, and privacy issues that might arise if raw data were collected at a central location outside the health care centers. Despite some pioneering work, federated analytics is still not widely used on real-world data, and to our knowledge, no real-world study has yet combined it with other privacy-enhancing techniques such as differential privacy (DP).

Objective: The first objective of this study was to deploy a federated architecture in a real-world setting. The oncology study used for this deployment compared the medical health care management of patients with metastatic non-small cell lung cancer before and after the first wave of COVID-19 pandemic. The second goal was to test DP in this real-world scenario to assess its practicality and use as a privacy-enhancing technology.

Methods: A federated architecture platform was set up in the Toulouse, Reims, and Foch centers. After harmonization of the data in each center, statistical analyses were performed using DataSHIELD (Data aggregation through anonymous summary-statistics from harmonized individual-level databases), a federated analysis R library, and a new open-source DP DataSHIELD package was implemented (dsPrivacy).

Results: A total of 50 patients were enrolled in the Toulouse and Reims centers and 49 in the Foch center. We have shown that DataSHIELD is a practical tool to efficiently conduct our study across all 3 centers without exposing data on a central node, once a sufficient setup has been established to configure a secure network between hospitals. All planned aggregated results were successfully generated. We also observed that DP can be implemented in practice with promising trade-offs between privacy and accuracy, and we built a library that will prove useful for future work.

Conclusions: The federated architecture platform made it possible to run a multicenter study on real-world oncology data while ensuring strong privacy guarantees using differential privacy.

Abstract Image

Abstract Image

肿瘤研究中具有差异隐私的联邦分析:跨医院数据仓库的纵向观察研究。
背景:医疗保健中的联邦分析允许研究人员在不访问原始数据的情况下对远程数据集执行统计查询。这种方法的产生是因为需要对多个医疗保健中心收集的大型数据集执行统计分析,同时避免在医疗保健中心以外的中心位置收集原始数据可能出现的监管、治理和隐私问题。尽管有一些开创性的工作,但联邦分析仍然没有广泛用于现实世界的数据,据我们所知,还没有现实世界的研究将其与其他隐私增强技术(如差分隐私(DP))结合起来。目标:本研究的第一个目标是在实际环境中部署联邦架构。用于该部署的肿瘤学研究比较了第一波COVID-19大流行前后转移性非小细胞肺癌患者的医疗保健管理。第二个目标是在这个真实场景中测试DP,以评估其实用性和作为隐私增强技术的用途。方法:在图卢兹、兰斯和福奇中心建立联合建筑平台。在对每个中心的数据进行统一后,使用DataSHIELD(通过统一的个人级数据库的匿名汇总统计进行数据聚合)、联邦分析R库和新的开源DP DataSHIELD包(dspprivacy)进行统计分析。结果:Toulouse和Reims中心共入组50例患者,Foch中心入组49例患者。我们已经证明,一旦建立了足够的设置来配置医院之间的安全网络,DataSHIELD是一种实用的工具,可以有效地在所有3个中心进行研究,而不会在中心节点上暴露数据。所有计划的聚合结果都成功生成。我们还观察到,DP可以在实践中实现,在隐私和准确性之间进行有希望的权衡,并且我们构建了一个库,将证明对未来的工作有用。结论:联合架构平台使得对真实肿瘤数据进行多中心研究成为可能,同时使用差分隐私确保强大的隐私保障。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信