非洲卫生信息交换中心常规收集的健康数据的记录链接。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES
International Journal of Population Data Science Pub Date : 2023-02-28 eCollection Date: 2023-01-01 DOI:10.23889/ijpds.v6i1.1771
Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin
{"title":"非洲卫生信息交换中心常规收集的健康数据的记录链接。","authors":"Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin","doi":"10.23889/ijpds.v6i1.1771","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.</p><p><strong>Aim: </strong>This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.</p><p><strong>Methods: </strong>We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.</p><p><strong>Results: </strong>The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 1","pages":"1771"},"PeriodicalIF":1.6000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/83/ijpds-08-1771.PMC10448229.pdf","citationCount":"0","resultStr":"{\"title\":\"Record linkage for routinely collected health data in an African health information exchange.\",\"authors\":\"Themba Mutemaringa, Alexa Heekes, Mariette Smith, Andrew Boulle, Nicki Tiffin\",\"doi\":\"10.23889/ijpds.v6i1.1771\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.</p><p><strong>Aim: </strong>This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.</p><p><strong>Methods: </strong>We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.</p><p><strong>Results: </strong>The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.</p>\",\"PeriodicalId\":36483,\"journal\":{\"name\":\"International Journal of Population Data Science\",\"volume\":\"8 1\",\"pages\":\"1771\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2023-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8e/83/ijpds-08-1771.PMC10448229.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Population Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23889/ijpds.v6i1.1771\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Population Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23889/ijpds.v6i1.1771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

简介病人主索引(PMI)在病人信息管理和流行病学研究中发挥着重要作用,病人唯一标识符的可用性提高了不同数据集之间病人记录链接的准确性。然而,在我们的环境中,包含病人信息的所有数据集中很少有唯一的标识符。准标识符被用来尝试链接病人记录,但有时会带来更高的过度链接风险。因此,数据质量和完整性会影响正确链接的能力。目的:本文介绍了目前在南非西开普省卫生数据中心(PHDC)实施的病历链接系统,并对其迄今为止的产出进行了评估:方法:我们采用逐步确定性记录关联方法,将南非西开普省卫生信息系统中定期收集的患者数据关联起来。链接过程中使用的变量包括南非身份证号码(RSA ID)、出生日期、出生年份、出生月份、出生日期、居住地址和联系方式。描述性分析用于估计省级 PMI 中重复的程度和范围:结果:省级人口普查中的重复比例在 10%至 20%之间。重复的主要原因是拼写错误,而姓氏和名字的错误占大多数,在大约22%的重复中,同一个人的名字和姓氏是不同的。目前的链接算法需要改进,因为它所使用的算法是针对英国化姓名开发和验证的,而对本地姓名可能效果不佳。链接还受到数据质量相关问题的影响,这些问题与数据的常规性质有关,通常很难在数据采集时验证和执行完整性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Record linkage for routinely collected health data in an African health information exchange.

Record linkage for routinely collected health data in an African health information exchange.

Record linkage for routinely collected health data in an African health information exchange.

Record linkage for routinely collected health data in an African health information exchange.

Introduction: The Patient Master Index (PMI) plays an important role in management of patient information and epidemiological research, and the availability of unique patient identifiers improves the accuracy when linking patient records across disparate datasets. In our environment, however, a unique identifier is seldom present in all datasets containing patient information. Quasi identifiers are used to attempt to link patient records but sometimes present higher risk of over-linking. Data quality and completeness thus affect the ability to make correct linkages.

Aim: This paper describes the record linkage system that is currently implemented at the Provincial Health Data Centre (PHDC) in the Western Cape, South Africa, and assesses its output to date.

Methods: We apply a stepwise deterministic record linkage approach to link patient data that are routinely collected from health information systems in the Western Cape province of South Africa. Variables used in the linkage process include South African National Identity number (RSA ID), date of birth, year of birth, month of birth, day of birth, residential address and contact information. Descriptive analyses are used to estimate the level and extent of duplication in the provincial PMI.

Results: The percentage of duplicates in the provincial PMI lies between 10% and 20%. Duplicates mainly arise from spelling errors, and surname and first names carry most of the errors, with the first names and surname being different for the same individual in approximately 22% of duplicates. The RSA ID is the variable mostly affected by poor completeness with less than 30% of the records having an RSA ID.The current linkage algorithm requires refinement as it makes use of algorithms that have been developed and validated on anglicised names which might not work well for local names. Linkage is also affected by data quality-related issues that are associated with the routine nature of the data which often make it difficult to validate and enforce integrity at the point of data capture.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.50
自引率
0.00%
发文量
386
审稿时长
20 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信