Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: A Netherlands consortium of dementia cohorts case study

IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Pedro Mateus , Justine Moonen , Magdalena Beran , Eva Jaarsma , Sophie M. van der Landen , Joost Heuvelink , Mahlet Birhanu , Alexander G.J. Harms , Esther Bron , Frank J. Wolters , Davy Cats , Hailiang Mei , Julie Oomens , Willemijn Jansen , Miranda T. Schram , Andre Dekker , Inigo Bermejo
{"title":"Data harmonization and federated learning for multi-cohort dementia research using the OMOP common data model: A Netherlands consortium of dementia cohorts case study","authors":"Pedro Mateus ,&nbsp;Justine Moonen ,&nbsp;Magdalena Beran ,&nbsp;Eva Jaarsma ,&nbsp;Sophie M. van der Landen ,&nbsp;Joost Heuvelink ,&nbsp;Mahlet Birhanu ,&nbsp;Alexander G.J. Harms ,&nbsp;Esther Bron ,&nbsp;Frank J. Wolters ,&nbsp;Davy Cats ,&nbsp;Hailiang Mei ,&nbsp;Julie Oomens ,&nbsp;Willemijn Jansen ,&nbsp;Miranda T. Schram ,&nbsp;Andre Dekker ,&nbsp;Inigo Bermejo","doi":"10.1016/j.jbi.2024.104661","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Establishing collaborations between cohort studies has been fundamental for progress in health research. However, such collaborations are hampered by heterogeneous data representations across cohorts and legal constraints to data sharing. The first arises from a lack of consensus in standards of data collection and representation across cohort studies and is usually tackled by applying data harmonization processes. The second is increasingly important due to raised awareness for privacy protection and stricter regulations, such as the GDPR. Federated learning has emerged as a privacy-preserving alternative to transferring data between institutions through analyzing data in a decentralized manner.</p></div><div><h3>Methods</h3><p>In this study, we set up a federated learning infrastructure for a consortium of nine Dutch cohorts with appropriate data available to the etiology of dementia, including an extract, transform, and load (ETL) pipeline for data harmonization. Additionally, we assessed the challenges of transforming and standardizing cohort data using the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) and evaluated our tool in one of the cohorts employing federated algorithms.</p></div><div><h3>Results</h3><p>We successfully applied our ETL tool and observed a complete coverage of the cohorts’ data by the OMOP CDM. The OMOP CDM facilitated the data representation and standardization, but we identified limitations for cohort-specific data fields and in the scope of the vocabularies available. Specific challenges arise in a multi-cohort federated collaboration due to technical constraints in local environments, data heterogeneity, and lack of direct access to the data.</p></div><div><h3>Conclusion</h3><p>In this article, we describe the solutions to these challenges and limitations encountered in our study. Our study shows the potential of federated learning as a privacy-preserving solution for multi-cohort studies that enhance reproducibility and reuse of both data and analyses.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"155 ","pages":"Article 104661"},"PeriodicalIF":4.0000,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1532046424000790/pdfft?md5=427f60e31fbd734fb61c4e9620e9e4d4&pid=1-s2.0-S1532046424000790-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046424000790","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Establishing collaborations between cohort studies has been fundamental for progress in health research. However, such collaborations are hampered by heterogeneous data representations across cohorts and legal constraints to data sharing. The first arises from a lack of consensus in standards of data collection and representation across cohort studies and is usually tackled by applying data harmonization processes. The second is increasingly important due to raised awareness for privacy protection and stricter regulations, such as the GDPR. Federated learning has emerged as a privacy-preserving alternative to transferring data between institutions through analyzing data in a decentralized manner.

Methods

In this study, we set up a federated learning infrastructure for a consortium of nine Dutch cohorts with appropriate data available to the etiology of dementia, including an extract, transform, and load (ETL) pipeline for data harmonization. Additionally, we assessed the challenges of transforming and standardizing cohort data using the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) and evaluated our tool in one of the cohorts employing federated algorithms.

Results

We successfully applied our ETL tool and observed a complete coverage of the cohorts’ data by the OMOP CDM. The OMOP CDM facilitated the data representation and standardization, but we identified limitations for cohort-specific data fields and in the scope of the vocabularies available. Specific challenges arise in a multi-cohort federated collaboration due to technical constraints in local environments, data heterogeneity, and lack of direct access to the data.

Conclusion

In this article, we describe the solutions to these challenges and limitations encountered in our study. Our study shows the potential of federated learning as a privacy-preserving solution for multi-cohort studies that enhance reproducibility and reuse of both data and analyses.

Abstract Image

使用 OMOP 通用数据模型进行多队列痴呆症研究的数据协调和联合学习:荷兰痴呆症队列联盟案例研究。
背景:在队列研究之间建立合作关系是健康研究取得进展的基础。然而,由于队列研究之间的数据表示不尽相同,而且数据共享受到法律限制,这种合作受到了阻碍。前者是因为队列研究的数据收集和表示标准缺乏共识,通常通过应用数据协调流程来解决。第二种情况由于隐私保护意识的提高和更严格的法规(如 GDPR)而变得越来越重要。通过分散分析数据,联邦学习已成为机构间传输数据的一种保护隐私的替代方法:在这项研究中,我们为一个由九个荷兰队列组成的联合体建立了一个联合学习基础设施,该联合体拥有可用于痴呆病因学研究的适当数据,包括一个用于数据协调的提取、转换和加载(ETL)管道。此外,我们还评估了使用观察性医疗结果合作组织(OMOP)通用数据模型(CDM)对队列数据进行转换和标准化所面临的挑战,并在其中一个采用联合算法的队列中对我们的工具进行了评估:结果:我们成功应用了我们的 ETL 工具,并观察到 OMOP CDM 完全覆盖了队列数据。OMOP CDM为数据表示和标准化提供了便利,但我们也发现了队列特定数据字段和可用词汇范围的局限性。由于当地环境的技术限制、数据异构性以及缺乏对数据的直接访问,在多队列联合协作中出现了具体的挑战:在本文中,我们介绍了在研究中遇到的这些挑战和限制的解决方案。我们的研究显示了联合学习作为多队列研究的隐私保护解决方案的潜力,它能提高数据和分析的可重复性和重复使用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Biomedical Informatics
Journal of Biomedical Informatics 医学-计算机:跨学科应用
CiteScore
8.90
自引率
6.70%
发文量
243
审稿时长
32 days
期刊介绍: The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信