Khumbo Shumba, Jacob Bor, Cornelius Nattey, Dickman Gareta, Evelyn Lauren, William Macleod, Matthew P Fox, Adrian Puren, Koleka Mlisana, Dorina Onoya
{"title":"Record linkage without patient identifiers: Proof of concept using data from South Africa's national HIV program.","authors":"Khumbo Shumba, Jacob Bor, Cornelius Nattey, Dickman Gareta, Evelyn Lauren, William Macleod, Matthew P Fox, Adrian Puren, Koleka Mlisana, Dorina Onoya","doi":"10.1371/journal.pgph.0004835","DOIUrl":null,"url":null,"abstract":"<p><p>Linkage between health databases typically requires patient identifiers such as names and personal identification numbers. We developed and validated a record linkage strategy to combine administrative health databases without identifiers for South Africa's public sector HIV program. We linked CD4 counts and HIV viral loads from South Africa's TIER.Net with the National Health Laboratory Service (NHLS) database for patients receiving care between 2015-2019 in Ekurhuleni District (Gauteng Province). Linkage variables were result value, specimen collection date, facility of collection, year and month of birth, and sex. We used three matching strategies: exact matching on exact values of all variables, caliper matching allowing a ± 5 day window on result date, and specimen barcode matching using unique specimen identifiers. A sequential linkage approach applied specimen barcode, followed by exact, and then caliper matching. Exact and caliper matching were validated using barcodes (available for 34% of records in TIER.Net) as a \"gold standard\". Performance measures were sensitivity, positive predictive value (PPV), share of patients linked, and percent increase in data points. We attempted to link 2,017,290 laboratory test results from TIER.Net (523,558 unique patients) with 2,414,059 NHLS test results. Exact matching achieved 69.0% sensitivity and 95.1% PPV. Caliper matching achieved 75% sensitivity and 94.5% PPV. Sequential linkage matched 41.9% using specimen barcodes, 51.3% through exact matching, and 6.8% through caliper matching, for 71.9% (95% CI: 71.9, 72.0) of test results matched overall, with 96.8% (95% CI: 96.7, 97.1) PPV and 85.9% (95% CI: 85.7, 85.9) sensitivity. This linked 86.0% (95% CI: 85.9, 86.1) of TIER.Net patients to the NHLS (N = 1,450,087), increasing laboratory results in TIER.Net by 62.6%. Linkage of TIER.Net and NHLS without patient identifiers attained high accuracy and yield without compromising privacy. The integrated cohort provides a more complete laboratory test history and supports more accurate HIV program indicator estimates.</p>","PeriodicalId":74466,"journal":{"name":"PLOS global public health","volume":"5 7","pages":"e0004835"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240394/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS global public health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pgph.0004835","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Linkage between health databases typically requires patient identifiers such as names and personal identification numbers. We developed and validated a record linkage strategy to combine administrative health databases without identifiers for South Africa's public sector HIV program. We linked CD4 counts and HIV viral loads from South Africa's TIER.Net with the National Health Laboratory Service (NHLS) database for patients receiving care between 2015-2019 in Ekurhuleni District (Gauteng Province). Linkage variables were result value, specimen collection date, facility of collection, year and month of birth, and sex. We used three matching strategies: exact matching on exact values of all variables, caliper matching allowing a ± 5 day window on result date, and specimen barcode matching using unique specimen identifiers. A sequential linkage approach applied specimen barcode, followed by exact, and then caliper matching. Exact and caliper matching were validated using barcodes (available for 34% of records in TIER.Net) as a "gold standard". Performance measures were sensitivity, positive predictive value (PPV), share of patients linked, and percent increase in data points. We attempted to link 2,017,290 laboratory test results from TIER.Net (523,558 unique patients) with 2,414,059 NHLS test results. Exact matching achieved 69.0% sensitivity and 95.1% PPV. Caliper matching achieved 75% sensitivity and 94.5% PPV. Sequential linkage matched 41.9% using specimen barcodes, 51.3% through exact matching, and 6.8% through caliper matching, for 71.9% (95% CI: 71.9, 72.0) of test results matched overall, with 96.8% (95% CI: 96.7, 97.1) PPV and 85.9% (95% CI: 85.7, 85.9) sensitivity. This linked 86.0% (95% CI: 85.9, 86.1) of TIER.Net patients to the NHLS (N = 1,450,087), increasing laboratory results in TIER.Net by 62.6%. Linkage of TIER.Net and NHLS without patient identifiers attained high accuracy and yield without compromising privacy. The integrated cohort provides a more complete laboratory test history and supports more accurate HIV program indicator estimates.