Kavita Batra, Vidhani S. Goel, Ana L. Reyes, Bertille Assoumou, Dodds P. Simangan, Farooq Abdulla, Deborah A. Kuhls
{"title":"Unifying and linking data sources in medical and public health research","authors":"Kavita Batra, Vidhani S. Goel, Ana L. Reyes, Bertille Assoumou, Dodds P. Simangan, Farooq Abdulla, Deborah A. Kuhls","doi":"10.1016/j.glmedi.2024.100164","DOIUrl":null,"url":null,"abstract":"<div><div>Data linkage methods, including probabilistic, deterministic, and hybrid are critical for linking medical and public health records, expanding data scope, and improving research outcomes. These methods differ in accuracy, efficiency, and scalability. This letter seeks to identify best practices for enhancing data quality and linkage rates in healthcare and public health research using these techniques. Data linkage enhances data quality by removing duplicates and correcting artifacts, facilitates cost-effective longitudinal studies by integrating existing data, and supports public health through person-oriented statistics and disease registries. Tools like \"RecordLinkage\" in R and EpiLink have advanced linkage accuracy, particularly in epidemiological studies. A PubMed search in November 2023 identified 176 studies, with 29 meeting inclusion criteria. Hybrid methods showed superior accuracy, with some studies achieving over 90 % linkage rates. Emerging AI-driven methods can further improved scalability, efficiency, and automation, employing privacy-preserving techniques like federated learning to address confidentiality concerns. However, challenges such as inconsistent data, incomplete identifiers, and technical complexities remain, emphasizing the need for standardized protocols and robust ethical frameworks. In low- and middle-income countries (LMICs), tailored strategies such as enhancing health information systems, adopting open-source tools, and fostering regional collaborations are essential to address resource constraints. Initiatives like the Western Australian Data Linkage System exemplify the potential impact of linkage on healthcare and public health. Future research should focus on refining methods, integrating diverse datasets, and leveraging AI to improve linkage efficiency and reliability. By adopting best practices, data linkage can enhance decision-making, optimize interventions, and advance global health research.</div></div>","PeriodicalId":100804,"journal":{"name":"Journal of Medicine, Surgery, and Public Health","volume":"5 ","pages":"Article 100164"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medicine, Surgery, and Public Health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949916X24001178","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data linkage methods, including probabilistic, deterministic, and hybrid are critical for linking medical and public health records, expanding data scope, and improving research outcomes. These methods differ in accuracy, efficiency, and scalability. This letter seeks to identify best practices for enhancing data quality and linkage rates in healthcare and public health research using these techniques. Data linkage enhances data quality by removing duplicates and correcting artifacts, facilitates cost-effective longitudinal studies by integrating existing data, and supports public health through person-oriented statistics and disease registries. Tools like "RecordLinkage" in R and EpiLink have advanced linkage accuracy, particularly in epidemiological studies. A PubMed search in November 2023 identified 176 studies, with 29 meeting inclusion criteria. Hybrid methods showed superior accuracy, with some studies achieving over 90 % linkage rates. Emerging AI-driven methods can further improved scalability, efficiency, and automation, employing privacy-preserving techniques like federated learning to address confidentiality concerns. However, challenges such as inconsistent data, incomplete identifiers, and technical complexities remain, emphasizing the need for standardized protocols and robust ethical frameworks. In low- and middle-income countries (LMICs), tailored strategies such as enhancing health information systems, adopting open-source tools, and fostering regional collaborations are essential to address resource constraints. Initiatives like the Western Australian Data Linkage System exemplify the potential impact of linkage on healthcare and public health. Future research should focus on refining methods, integrating diverse datasets, and leveraging AI to improve linkage efficiency and reliability. By adopting best practices, data linkage can enhance decision-making, optimize interventions, and advance global health research.