Improving opportunities for data linkage within Children Looked After administrative records in Wales.

IF 2.2 Q3 HEALTH CARE SCIENCES & SERVICES
International Journal of Population Data Science Pub Date : 2025-02-19 eCollection Date: 2025-01-01 DOI:10.23889/ijpds.v10i1.2383
Grace A Bailey, Alex Lee, Saira Ahmed, Ieuan Scanlon, Laura E Cowley, Amy Stuart, Ian Farr, Caroline Brooks, Laura North, Lucy J Griffiths
{"title":"Improving opportunities for data linkage within Children Looked After administrative records in Wales.","authors":"Grace A Bailey, Alex Lee, Saira Ahmed, Ieuan Scanlon, Laura E Cowley, Amy Stuart, Ian Farr, Caroline Brooks, Laura North, Lucy J Griffiths","doi":"10.23889/ijpds.v10i1.2383","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Linkage of population-based administrative data is a powerful tool for studying important public issues. To overcome confidentiality and disclosure issues, records are de-identified and allocated a unique identifier. Within the Secure Anonymised Information Linkage (SAIL) Databank, these are known as Anonymised Linking Fields (ALFs). Assignment of an ALF enables linkage of individuals across multiple routinely collected datasets. Within the Children Looked After (CLA) Wales dataset, only 37% of the children have an ALF, limiting linkage to other datasets and, as a result, potential research. There are also other known data issues, including discrepancies with the week of births, duplicate identifiers and year-on-year changes in identifiers. Objectives To improve accuracy and availability of the ALFs in the CLA dataset, and overall research quality.</p><p><strong>Methods: </strong>Using several datasets within the SAIL Databank, we developed a six-step CLA matching algorithm to improve the ALF matching rate and correct for data errors. To assess the performance of our algorithm, we benchmarked against routine ALFs already identified via the algorithm currently used by SAIL.</p><p><strong>Results: </strong>Our algorithm increased ALF matching by 25%, assigning 61% of individuals an ALF. Inconsistent weeks of birth, and incorrect and duplicate identifiers were resolved. When benchmarking against the current ALF-assigning algorithm used by SAIL, our algorithm had an overall sensitivity of 90%.</p><p><strong>Conclusion: </strong>We have developed an algorithm which demonstrates comparable ALF matching performance to the current algorithm used within SAIL, and which greatly improves the ALF matching in the CLA dataset. This algorithm may help to overcome potential bias due to missing data, and increases the potential for linkage to other datasets. Further development and refinement could result in the algorithm being applied to other datasets in SAIL.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2383"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12502067/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Population Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23889/ijpds.v10i1.2383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Linkage of population-based administrative data is a powerful tool for studying important public issues. To overcome confidentiality and disclosure issues, records are de-identified and allocated a unique identifier. Within the Secure Anonymised Information Linkage (SAIL) Databank, these are known as Anonymised Linking Fields (ALFs). Assignment of an ALF enables linkage of individuals across multiple routinely collected datasets. Within the Children Looked After (CLA) Wales dataset, only 37% of the children have an ALF, limiting linkage to other datasets and, as a result, potential research. There are also other known data issues, including discrepancies with the week of births, duplicate identifiers and year-on-year changes in identifiers. Objectives To improve accuracy and availability of the ALFs in the CLA dataset, and overall research quality.

Methods: Using several datasets within the SAIL Databank, we developed a six-step CLA matching algorithm to improve the ALF matching rate and correct for data errors. To assess the performance of our algorithm, we benchmarked against routine ALFs already identified via the algorithm currently used by SAIL.

Results: Our algorithm increased ALF matching by 25%, assigning 61% of individuals an ALF. Inconsistent weeks of birth, and incorrect and duplicate identifiers were resolved. When benchmarking against the current ALF-assigning algorithm used by SAIL, our algorithm had an overall sensitivity of 90%.

Conclusion: We have developed an algorithm which demonstrates comparable ALF matching performance to the current algorithm used within SAIL, and which greatly improves the ALF matching in the CLA dataset. This algorithm may help to overcome potential bias due to missing data, and increases the potential for linkage to other datasets. Further development and refinement could result in the algorithm being applied to other datasets in SAIL.

Abstract Image

Abstract Image

Abstract Image

改善威尔士儿童照顾行政记录中数据联系的机会。
基于人口的行政数据联动是研究重要公共问题的有力工具。为了克服机密性和披露问题,记录被去标识化并分配一个唯一标识符。在安全匿名信息链接(SAIL)数据库中,这些被称为匿名链接字段(alf)。分配一个ALF可以实现跨多个常规收集的数据集的个体链接。在儿童看护(CLA)威尔士数据集中,只有37%的儿童有ALF,限制了与其他数据集的联系,从而限制了潜在的研究。还有其他已知的数据问题,包括出生周不一致、标识符重复以及标识符逐年变化。目的提高CLA数据集中alf的准确性和可用性,提高整体研究质量。方法:利用SAIL数据库中的多个数据集,开发了一种六步CLA匹配算法,以提高ALF匹配率并纠正数据错误。为了评估算法的性能,我们对SAIL目前使用的算法已经确定的常规alf进行了基准测试。结果:我们的算法将ALF匹配率提高了25%,为61%的个体分配了一个ALF。解决了不一致的出生周以及不正确和重复的标识符。当与SAIL使用的当前alf分配算法进行基准测试时,我们的算法的总体灵敏度为90%。结论:我们开发了一种算法,其ALF匹配性能与SAIL中使用的现有算法相当,并且大大提高了CLA数据集中的ALF匹配。该算法可以帮助克服由于缺失数据造成的潜在偏差,并增加与其他数据集的链接潜力。进一步的开发和改进可以使该算法应用于SAIL中的其他数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.50
自引率
0.00%
发文量
386
审稿时长
20 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信