Elizabeth M Garry, Aidan Baglivo, Priya Govil, Jennifer L Duryea, Wei Liu, Tamar Lasky, Aloka Chakravarty, Donna R Rivera, Marie C Bradley
{"title":"Evaluating the Impact of Data Standardization on Real-World Data.","authors":"Elizabeth M Garry, Aidan Baglivo, Priya Govil, Jennifer L Duryea, Wei Liu, Tamar Lasky, Aloka Chakravarty, Donna R Rivera, Marie C Bradley","doi":"10.1002/pds.70191","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To understand the impact of standardizing administrative healthcare data to the Sentinel common data model for cohort selection and descriptive findings.</p><p><strong>Methods: </strong>Among patients with an outpatient COVID-19 diagnosis (January 2021-December 2022) in HealthVerity using the data in its native and the standardized format, we descriptively compared cohort attrition and sample size, patient characteristics, and healthcare resource utilization during baseline and incidence of selected conditions after COVID-19 diagnosis.</p><p><strong>Results: </strong>The standardized cohort included fewer patients than the native (164 445 vs. 198 317), but age (median 48 years) and sex (70% female) were the same. The distribution of race was similar; however, the standardized cohort mapped patients with \"Other\" race to the \"Unknown/Missing\" race category, which created differences among those categories. Distributions were similar, albeit slightly lower for comorbidities (differences < 1%), and lower for SARS-CoV-2 diagnostic tests (59% vs. 70%). Medical encounter counts were also lower, with substantial differences that were attenuated after limiting encounter counts to one event per day (e.g., mean count of 6.0 vs. 27.7 specialty care visits reduced to 2.9 vs. 3.5). Incidence rates were lower, with the greatest difference for hepatotoxicity (29.6 vs. 37.1 per 1000 person-years).</p><p><strong>Conclusions: </strong>The data standardization refines the data (e.g., removes duplicate claims and variables or variable categories), which may reduce outliers and errors but yield lower distributions and counts of certain variables than observed in native format data. Therefore, it is critical to understand how standardization impacts the data and subsequently its fitness for use.</p>","PeriodicalId":19782,"journal":{"name":"Pharmacoepidemiology and Drug Safety","volume":"34 8","pages":"e70191"},"PeriodicalIF":2.4000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pharmacoepidemiology and Drug Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pds.70191","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To understand the impact of standardizing administrative healthcare data to the Sentinel common data model for cohort selection and descriptive findings.
Methods: Among patients with an outpatient COVID-19 diagnosis (January 2021-December 2022) in HealthVerity using the data in its native and the standardized format, we descriptively compared cohort attrition and sample size, patient characteristics, and healthcare resource utilization during baseline and incidence of selected conditions after COVID-19 diagnosis.
Results: The standardized cohort included fewer patients than the native (164 445 vs. 198 317), but age (median 48 years) and sex (70% female) were the same. The distribution of race was similar; however, the standardized cohort mapped patients with "Other" race to the "Unknown/Missing" race category, which created differences among those categories. Distributions were similar, albeit slightly lower for comorbidities (differences < 1%), and lower for SARS-CoV-2 diagnostic tests (59% vs. 70%). Medical encounter counts were also lower, with substantial differences that were attenuated after limiting encounter counts to one event per day (e.g., mean count of 6.0 vs. 27.7 specialty care visits reduced to 2.9 vs. 3.5). Incidence rates were lower, with the greatest difference for hepatotoxicity (29.6 vs. 37.1 per 1000 person-years).
Conclusions: The data standardization refines the data (e.g., removes duplicate claims and variables or variable categories), which may reduce outliers and errors but yield lower distributions and counts of certain variables than observed in native format data. Therefore, it is critical to understand how standardization impacts the data and subsequently its fitness for use.
期刊介绍:
The aim of Pharmacoepidemiology and Drug Safety is to provide an international forum for the communication and evaluation of data, methods and opinion in the discipline of pharmacoepidemiology. The Journal publishes peer-reviewed reports of original research, invited reviews and a variety of guest editorials and commentaries embracing scientific, medical, statistical, legal and economic aspects of pharmacoepidemiology and post-marketing surveillance of drug safety. Appropriate material in these categories may also be considered for publication as a Brief Report.
Particular areas of interest include:
design, analysis, results, and interpretation of studies looking at the benefit or safety of specific pharmaceuticals, biologics, or medical devices, including studies in pharmacovigilance, postmarketing surveillance, pharmacoeconomics, patient safety, molecular pharmacoepidemiology, or any other study within the broad field of pharmacoepidemiology;
comparative effectiveness research relating to pharmaceuticals, biologics, and medical devices. Comparative effectiveness research is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition, as these methods are truly used in the real world;
methodologic contributions of relevance to pharmacoepidemiology, whether original contributions, reviews of existing methods, or tutorials for how to apply the methods of pharmacoepidemiology;
assessments of harm versus benefit in drug therapy;
patterns of drug utilization;
relationships between pharmacoepidemiology and the formulation and interpretation of regulatory guidelines;
evaluations of risk management plans and programmes relating to pharmaceuticals, biologics and medical devices.