Record linkage without patient identifiers: Proof of concept using data from South Africa's national HIV program.

PLOS global public health Pub Date : 2025-07-09 eCollection Date: 2025-01-01 DOI:10.1371/journal.pgph.0004835
Khumbo Shumba, Jacob Bor, Cornelius Nattey, Dickman Gareta, Evelyn Lauren, William Macleod, Matthew P Fox, Adrian Puren, Koleka Mlisana, Dorina Onoya
{"title":"Record linkage without patient identifiers: Proof of concept using data from South Africa's national HIV program.","authors":"Khumbo Shumba, Jacob Bor, Cornelius Nattey, Dickman Gareta, Evelyn Lauren, William Macleod, Matthew P Fox, Adrian Puren, Koleka Mlisana, Dorina Onoya","doi":"10.1371/journal.pgph.0004835","DOIUrl":null,"url":null,"abstract":"<p><p>Linkage between health databases typically requires patient identifiers such as names and personal identification numbers. We developed and validated a record linkage strategy to combine administrative health databases without identifiers for South Africa's public sector HIV program. We linked CD4 counts and HIV viral loads from South Africa's TIER.Net with the National Health Laboratory Service (NHLS) database for patients receiving care between 2015-2019 in Ekurhuleni District (Gauteng Province). Linkage variables were result value, specimen collection date, facility of collection, year and month of birth, and sex. We used three matching strategies: exact matching on exact values of all variables, caliper matching allowing a ± 5 day window on result date, and specimen barcode matching using unique specimen identifiers. A sequential linkage approach applied specimen barcode, followed by exact, and then caliper matching. Exact and caliper matching were validated using barcodes (available for 34% of records in TIER.Net) as a \"gold standard\". Performance measures were sensitivity, positive predictive value (PPV), share of patients linked, and percent increase in data points. We attempted to link 2,017,290 laboratory test results from TIER.Net (523,558 unique patients) with 2,414,059 NHLS test results. Exact matching achieved 69.0% sensitivity and 95.1% PPV. Caliper matching achieved 75% sensitivity and 94.5% PPV. Sequential linkage matched 41.9% using specimen barcodes, 51.3% through exact matching, and 6.8% through caliper matching, for 71.9% (95% CI: 71.9, 72.0) of test results matched overall, with 96.8% (95% CI: 96.7, 97.1) PPV and 85.9% (95% CI: 85.7, 85.9) sensitivity. This linked 86.0% (95% CI: 85.9, 86.1) of TIER.Net patients to the NHLS (N = 1,450,087), increasing laboratory results in TIER.Net by 62.6%. Linkage of TIER.Net and NHLS without patient identifiers attained high accuracy and yield without compromising privacy. The integrated cohort provides a more complete laboratory test history and supports more accurate HIV program indicator estimates.</p>","PeriodicalId":74466,"journal":{"name":"PLOS global public health","volume":"5 7","pages":"e0004835"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240394/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS global public health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pgph.0004835","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Linkage between health databases typically requires patient identifiers such as names and personal identification numbers. We developed and validated a record linkage strategy to combine administrative health databases without identifiers for South Africa's public sector HIV program. We linked CD4 counts and HIV viral loads from South Africa's TIER.Net with the National Health Laboratory Service (NHLS) database for patients receiving care between 2015-2019 in Ekurhuleni District (Gauteng Province). Linkage variables were result value, specimen collection date, facility of collection, year and month of birth, and sex. We used three matching strategies: exact matching on exact values of all variables, caliper matching allowing a ± 5 day window on result date, and specimen barcode matching using unique specimen identifiers. A sequential linkage approach applied specimen barcode, followed by exact, and then caliper matching. Exact and caliper matching were validated using barcodes (available for 34% of records in TIER.Net) as a "gold standard". Performance measures were sensitivity, positive predictive value (PPV), share of patients linked, and percent increase in data points. We attempted to link 2,017,290 laboratory test results from TIER.Net (523,558 unique patients) with 2,414,059 NHLS test results. Exact matching achieved 69.0% sensitivity and 95.1% PPV. Caliper matching achieved 75% sensitivity and 94.5% PPV. Sequential linkage matched 41.9% using specimen barcodes, 51.3% through exact matching, and 6.8% through caliper matching, for 71.9% (95% CI: 71.9, 72.0) of test results matched overall, with 96.8% (95% CI: 96.7, 97.1) PPV and 85.9% (95% CI: 85.7, 85.9) sensitivity. This linked 86.0% (95% CI: 85.9, 86.1) of TIER.Net patients to the NHLS (N = 1,450,087), increasing laboratory results in TIER.Net by 62.6%. Linkage of TIER.Net and NHLS without patient identifiers attained high accuracy and yield without compromising privacy. The integrated cohort provides a more complete laboratory test history and supports more accurate HIV program indicator estimates.

没有患者标识符的记录链接:使用南非国家艾滋病毒规划数据的概念验证。
健康数据库之间的连接通常需要患者标识符,如姓名和个人识别号码。我们制定并验证了一项记录联系战略,将南非公共部门艾滋病毒项目中没有标识符的行政卫生数据库结合起来。我们将来自南非TIER的CD4计数和HIV病毒载量联系起来。2015-2019年期间在埃库胡莱尼区(豪登省)接受治疗的患者的国家卫生实验室服务(NHLS)数据库。联动变量为结果值、标本采集日期、采集设施、出生年月日和性别。我们使用了三种匹配策略:对所有变量的精确值进行精确匹配,在结果日期上允许±5天窗口的卡尺匹配,以及使用唯一标本标识符的标本条形码匹配。顺序链接方法应用标本条形码,其次是精确,然后卡尺匹配。使用条形码(TIER.Net中34%的记录可用)作为“黄金标准”来验证精确和卡尺匹配。性能指标包括敏感性、阳性预测值(PPV)、相关患者比例和数据点增加百分比。我们试图链接来自TIER的2,017,290个实验室检测结果。Net(523,558名独特患者),有2,414,059个NHLS测试结果。精确匹配达到69.0%的灵敏度和95.1%的PPV。卡尺匹配达到75%的灵敏度和94.5%的PPV。标本条形码序列连锁匹配41.9%,精确匹配51.3%,卡尺匹配6.8%,总体匹配71.9% (95% CI: 71.9, 72.0), PPV 96.8% (95% CI: 96.7, 97.1),灵敏度85.9% (95% CI: 85.7, 85.9)。这与86.0% (95% CI: 85.9, 86.1)的TIER相关。净患者到NHLS (N = 1450,087),增加TIER的实验室结果。净增长62.6%。TIER的联动。Net和没有患者标识符的NHLS在不损害隐私的情况下获得了高准确性和产量。综合队列提供了更完整的实验室检测历史,并支持更准确的艾滋病毒规划指标估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信