Record linkage for routinely collected health data in an African health information exchange.

IF 2.2 Q3 HEALTH CARE SCIENCES & SERVICES

International Journal of Population Data Science Pub Date : 2023-02-28 eCollection Date: 2023-01-01 DOI:10.23889/ijpds.v6i1.1771

Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin

{"title":"Record linkage for routinely collected health data in an African health information exchange.","authors":"Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin","doi":"10.23889/ijpds.v6i1.1771","DOIUrl":null,"url":null,"abstract":"Introduction: The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.Aim: This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.Methods: We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.Results: The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"1771"},"PeriodicalIF":2.2000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/83/ijpds-08-1771.PMC10448229.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Population Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23889/ijpds.v6i1.1771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.

Aim: This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.

Methods: We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.

Results: The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.

Abstract Image

查看原文本刊更多论文

非洲卫生信息交换中心常规收集的健康数据的记录链接。

简介病人主索引（PMI）在病人信息管理和流行病学研究中发挥着重要作用，病人唯一标识符的可用性提高了不同数据集之间病人记录链接的准确性。然而，在我们的环境中，包含病人信息的所有数据集中很少有唯一的标识符。准标识符被用来尝试链接病人记录，但有时会带来更高的过度链接风险。因此，数据质量和完整性会影响正确链接的能力。目的：本文介绍了目前在南非西开普省卫生数据中心（PHDC）实施的病历链接系统，并对其迄今为止的产出进行了评估：方法：我们采用逐步确定性记录关联方法，将南非西开普省卫生信息系统中定期收集的患者数据关联起来。链接过程中使用的变量包括南非身份证号码（RSA ID）、出生日期、出生年份、出生月份、出生日期、居住地址和联系方式。描述性分析用于估计省级 PMI 中重复的程度和范围：结果：省级人口普查中的重复比例在 10%至 20%之间。重复的主要原因是拼写错误，而姓氏和名字的错误占大多数，在大约22%的重复中，同一个人的名字和姓氏是不同的。目前的链接算法需要改进，因为它所使用的算法是针对英国化姓名开发和验证的，而对本地姓名可能效果不佳。链接还受到数据质量相关问题的影响，这些问题与数据的常规性质有关，通常很难在数据采集时验证和执行完整性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊