Laura Rossouw, Nkosinathi Ngcobo, Kate Clouse, Cornelius Nattey, Karl-Günter Technau, Mhairi Maskew
{"title":"Augmenting maternal clinical cohort data with administrative laboratory dataset linkages: a validation study.","authors":"Laura Rossouw, Nkosinathi Ngcobo, Kate Clouse, Cornelius Nattey, Karl-Günter Technau, Mhairi Maskew","doi":"10.1007/s44250-025-00298-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The use of big data and large language models in healthcare can play a key role in improving patient treatment and healthcare management, especially when applied to large-scale administrative data. A major challenge to achieving this is ensuring that patient confidentiality and personal information is protected. One way to overcome this is by augmenting clinical data with administrative laboratory dataset linkages in order to avoid the use of demographic information.</p><p><strong>Methods: </strong>We explored an alternative method to examine patient files from a large administrative dataset in South Africa (the National Health Laboratory Services, or NHLS), by linking external data to the NHLS database using specimen barcodes associated with laboratory tests. This provides a deterministic way of performing data linkages without accessing demographic information. In this paper, we quantify the performance metrics of this approach.</p><p><strong>Results: </strong>The linkage of the large NHLS data to external hospital data using specimen barcodes achieved a 95% success. Out of the 1200 records in the validation sample, 87% were exact matches and 9% were matches with typographic correction. The remaining 5% were either complete mismatches or were due to duplicates in the administrative data.</p><p><strong>Conclusions: </strong>The high success rate indicates the reliability of using barcodes for linking data without demographic identifiers. Specimen barcodes are an effective deterministic linkage tool that enable creation of large linked datasets without compromising confidentiality.</p>","PeriodicalId":72826,"journal":{"name":"Discover health systems","volume":"4 1","pages":"115"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12436568/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discover health systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s44250-025-00298-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/15 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The use of big data and large language models in healthcare can play a key role in improving patient treatment and healthcare management, especially when applied to large-scale administrative data. A major challenge to achieving this is ensuring that patient confidentiality and personal information is protected. One way to overcome this is by augmenting clinical data with administrative laboratory dataset linkages in order to avoid the use of demographic information.
Methods: We explored an alternative method to examine patient files from a large administrative dataset in South Africa (the National Health Laboratory Services, or NHLS), by linking external data to the NHLS database using specimen barcodes associated with laboratory tests. This provides a deterministic way of performing data linkages without accessing demographic information. In this paper, we quantify the performance metrics of this approach.
Results: The linkage of the large NHLS data to external hospital data using specimen barcodes achieved a 95% success. Out of the 1200 records in the validation sample, 87% were exact matches and 9% were matches with typographic correction. The remaining 5% were either complete mismatches or were due to duplicates in the administrative data.
Conclusions: The high success rate indicates the reliability of using barcodes for linking data without demographic identifiers. Specimen barcodes are an effective deterministic linkage tool that enable creation of large linked datasets without compromising confidentiality.
背景:在医疗保健中使用大数据和大语言模型可以在改善患者治疗和医疗保健管理方面发挥关键作用,特别是当应用于大规模管理数据时。实现这一目标的一个主要挑战是确保患者的机密性和个人信息得到保护。克服这一点的一种方法是通过增加临床数据与行政实验室数据集的联系,以避免使用人口统计信息。方法:我们探索了一种替代方法,通过使用与实验室测试相关的标本条形码将外部数据链接到南非国家卫生实验室服务(National Health Laboratory Services,简称NHLS)的大型管理数据集中的患者文件。这提供了一种执行数据链接的确定性方法,而无需访问人口统计信息。在本文中,我们量化了这种方法的性能指标。结果:使用标本条形码将大型NHLS数据与外部医院数据进行链接,成功率为95%。在验证样本中的1200条记录中,87%是精确匹配的,9%是带有排版更正的匹配。剩下的5%要么是完全不匹配,要么是由于管理数据的重复。结论:高成功率表明使用条形码链接无人口统计标识的数据的可靠性。标本条形码是一种有效的确定性链接工具,可以在不影响保密性的情况下创建大型链接数据集。