Privacy-preserving record linkage across disparate institutions and datasets to enable a learning health system: The national COVID cohort collaborative (N3C) experience

IF 2.6 Q2 HEALTH POLICY & SERVICES
Umberto Tachinardi, Shaun J. Grannis, Sam G. Michael, Leonie Misquitta, Jayme Dahlin, Usman Sheikh, Abel Kho, Jasmin Phua, Sara S. Rogovin, Benjamin Amor, Maya Choudhury, Philip Sparks, Amin Mannaa, Saad Ljazouli, Joel Saltz, Fred Prior, Ahmen Baghal, Kenneth Gersing, Peter J. Embi
{"title":"Privacy-preserving record linkage across disparate institutions and datasets to enable a learning health system: The national COVID cohort collaborative (N3C) experience","authors":"Umberto Tachinardi,&nbsp;Shaun J. Grannis,&nbsp;Sam G. Michael,&nbsp;Leonie Misquitta,&nbsp;Jayme Dahlin,&nbsp;Usman Sheikh,&nbsp;Abel Kho,&nbsp;Jasmin Phua,&nbsp;Sara S. Rogovin,&nbsp;Benjamin Amor,&nbsp;Maya Choudhury,&nbsp;Philip Sparks,&nbsp;Amin Mannaa,&nbsp;Saad Ljazouli,&nbsp;Joel Saltz,&nbsp;Fred Prior,&nbsp;Ahmen Baghal,&nbsp;Kenneth Gersing,&nbsp;Peter J. Embi","doi":"10.1002/lrh2.10404","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>Research driven by real-world clinical data is increasingly vital to enabling learning health systems, but integrating such data from across disparate health systems is challenging. As part of the NCATS National COVID Cohort Collaborative (N3C), the N3C Data Enclave was established as a centralized repository of deidentified and harmonized COVID-19 patient data from institutions across the US. However, making this data most useful for research requires linking it with information such as mortality data, images, and viral variants. The objective of this project was to establish privacy-preserving record linkage (PPRL) methods to ensure that patient-level EHR data remains secure and private when governance-approved linkages with other datasets occur.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Separate agreements and approval processes govern N3C data contribution and data access. The Linkage Honest Broker (LHB), an independent neutral party (the Regenstrief Institute), ensures data linkages are robust and secure by adding an extra layer of separation between protected health information and clinical data. The LHB's PPRL methods (including algorithms, processes, and governance) match patient records using “deidentified tokens,” which are hashed combinations of identifier fields that define a match across data repositories without using patients' clear-text identifiers.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>These methods enable three linkage functions: Deduplication, Linking Multiple Datasets, and Cohort Discovery. To date, two external repositories have been cross-linked. As of March 1, 2023, 43 sites have signed the LHB Agreement; 35 sites have sent tokens generated for 9 528 998 patients. In this initial cohort, the LHB identified 135 037 matches and 68 596 duplicates.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>This large-scale linkage study using deidentified datasets of varying characteristics established secure methods for protecting the privacy of N3C patient data when linked for research purposes. This technology has potential for use with registries for other diseases and conditions.</p>\n </section>\n </div>","PeriodicalId":43916,"journal":{"name":"Learning Health Systems","volume":"8 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/lrh2.10404","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Learning Health Systems","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/lrh2.10404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH POLICY & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Research driven by real-world clinical data is increasingly vital to enabling learning health systems, but integrating such data from across disparate health systems is challenging. As part of the NCATS National COVID Cohort Collaborative (N3C), the N3C Data Enclave was established as a centralized repository of deidentified and harmonized COVID-19 patient data from institutions across the US. However, making this data most useful for research requires linking it with information such as mortality data, images, and viral variants. The objective of this project was to establish privacy-preserving record linkage (PPRL) methods to ensure that patient-level EHR data remains secure and private when governance-approved linkages with other datasets occur.

Methods

Separate agreements and approval processes govern N3C data contribution and data access. The Linkage Honest Broker (LHB), an independent neutral party (the Regenstrief Institute), ensures data linkages are robust and secure by adding an extra layer of separation between protected health information and clinical data. The LHB's PPRL methods (including algorithms, processes, and governance) match patient records using “deidentified tokens,” which are hashed combinations of identifier fields that define a match across data repositories without using patients' clear-text identifiers.

Results

These methods enable three linkage functions: Deduplication, Linking Multiple Datasets, and Cohort Discovery. To date, two external repositories have been cross-linked. As of March 1, 2023, 43 sites have signed the LHB Agreement; 35 sites have sent tokens generated for 9 528 998 patients. In this initial cohort, the LHB identified 135 037 matches and 68 596 duplicates.

Conclusion

This large-scale linkage study using deidentified datasets of varying characteristics established secure methods for protecting the privacy of N3C patient data when linked for research purposes. This technology has potential for use with registries for other diseases and conditions.

Abstract Image

在不同机构和数据集之间建立保护隐私的记录链接,以实现学习型医疗系统:国家 COVID 队列协作(N3C)的经验
引言 由真实世界临床数据驱动的研究对实现学习型医疗系统越来越重要,但整合来自不同医疗系统的此类数据却极具挑战性。作为 NCATS 国家 COVID 队列合作(N3C)的一部分,N3C 数据飞地(N3C Data Enclave)的建立是为了集中存放来自美国各机构的去身份化和统一的 COVID-19 患者数据。然而,要使这些数据在研究中发挥最大作用,需要将其与死亡率数据、图像和病毒变异等信息联系起来。本项目的目的是建立隐私保护记录链接(PPRL)方法,以确保在与其他数据集进行管理批准的链接时,患者级电子病历数据仍能保持安全和隐私。 方法 对 N3C 数据贡献和数据访问实行单独的协议和审批程序。链接诚信经纪人(LHB)是一个独立的中立方(Regenstrief 研究所),通过在受保护健康信息和临床数据之间增加一层额外的隔离,确保数据链接的稳健性和安全性。LHB 的 PPRL 方法(包括算法、流程和管理)使用 "去标识符 "匹配患者记录,"去标识符 "是标识符字段的散列组合,可在不使用患者明文标识符的情况下定义跨数据存储库的匹配。 结果 这些方法实现了三种链接功能:重复数据消除、多数据集链接和队列发现。迄今为止,已有两个外部资料库实现了交叉链接。截至 2023 年 3 月 1 日,43 个研究机构签署了 LHB 协议;35 个研究机构发送了为 9 528 998 名患者生成的令牌。在这个初始队列中,LHB 发现了 135 037 个匹配项和 68 596 个重复项。 结论 这项使用不同特征的去身份化数据集进行的大规模链接研究确立了以研究为目的进行链接时保护 N3C 患者数据隐私的安全方法。这项技术有望用于其他疾病和病症的登记。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Learning Health Systems
Learning Health Systems HEALTH POLICY & SERVICES-
CiteScore
5.60
自引率
22.60%
发文量
55
审稿时长
20 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信