Sudha R Raman, Bradley G Hammill, Pamela A Shaw, Hana Lee, Sengwee Toh, John G Connolly, Kimberly J Dandreo, Vinit Nalawade, Fang Tian, Wei Liu, Jie Li, José J Hernández-Muñoz, Robert J Glynn, Rishi J Desai, Janick Weberpals
{"title":"Analyzing missingness patterns in real-world data using the SMDI toolkit: application to a linked EHR-claims pharmacoepidemiology study.","authors":"Sudha R Raman, Bradley G Hammill, Pamela A Shaw, Hana Lee, Sengwee Toh, John G Connolly, Kimberly J Dandreo, Vinit Nalawade, Fang Tian, Wei Liu, Jie Li, José J Hernández-Muñoz, Robert J Glynn, Rishi J Desai, Janick Weberpals","doi":"10.1186/s12874-024-02330-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely missingness mechanisms, illustrate the decision-making process for analyses in an empirical case example.</p><p><strong>Methods: </strong>We utilized the Structural Missing Data Investigations (SMDI) toolkit to characterize missing data patterns in the context of a pharmacoepidemiology study comparing cardiovascular outcomes of initiating sodium-glucose-cotransporter-2 inhibitors (SGLT2i) and dipeptidyl peptidase-4 inhibitors (DPP-4i) among older adults. The study used a linked EHR-Medicare claims dataset from Duke Health patients (2015-2017), focusing on partially observed confounders from EHR data (HbA1c lab and body mass index [BMI] values). Our analysis incorporated SMDI's descriptive functions and diagnostic tests to explore missingness patterns and determine missingness mitigation approaches. We used findings from these investigations to inform estimation of adjusted hazard ratios comparing the two classes of medications.</p><p><strong>Results: </strong>High levels of missingness were noted for important confounding variables including HbA1c (63.6%) and BMI (16.5%). Diagnostic tests resulted in output that described: 1) the distributions of patient characteristics, exposure, and outcome between patients with or without an observed value of the partially observed covariate, 2) the ability to predict missingness based on observed covariates, and 3) estimate if the missingness of a partially observed covariate is differential with respect to the outcome. There was evidence that missingness could be sufficiently described using observed data, which allowed multiple imputation by chained equations using random forests to address missing confounder data in estimating treatment effects. Multiple imputation resulted in improved alignment of effect estimates with previous studies.</p><p><strong>Conclusions: </strong>We were able to demonstrate the practical application of the SMDI toolkit in a real-world setting. Application of the SMDI toolkit and the resulting insights of potential missingness patterns can inform the choice of appropriate analytic methods and increase transparency of research methods in handling missing data. This type of approach can inform analytic decision making and may increase our ability to generate evidence from real-world data.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"246"},"PeriodicalIF":3.9000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11490010/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02330-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely missingness mechanisms, illustrate the decision-making process for analyses in an empirical case example.
Methods: We utilized the Structural Missing Data Investigations (SMDI) toolkit to characterize missing data patterns in the context of a pharmacoepidemiology study comparing cardiovascular outcomes of initiating sodium-glucose-cotransporter-2 inhibitors (SGLT2i) and dipeptidyl peptidase-4 inhibitors (DPP-4i) among older adults. The study used a linked EHR-Medicare claims dataset from Duke Health patients (2015-2017), focusing on partially observed confounders from EHR data (HbA1c lab and body mass index [BMI] values). Our analysis incorporated SMDI's descriptive functions and diagnostic tests to explore missingness patterns and determine missingness mitigation approaches. We used findings from these investigations to inform estimation of adjusted hazard ratios comparing the two classes of medications.
Results: High levels of missingness were noted for important confounding variables including HbA1c (63.6%) and BMI (16.5%). Diagnostic tests resulted in output that described: 1) the distributions of patient characteristics, exposure, and outcome between patients with or without an observed value of the partially observed covariate, 2) the ability to predict missingness based on observed covariates, and 3) estimate if the missingness of a partially observed covariate is differential with respect to the outcome. There was evidence that missingness could be sufficiently described using observed data, which allowed multiple imputation by chained equations using random forests to address missing confounder data in estimating treatment effects. Multiple imputation resulted in improved alignment of effect estimates with previous studies.
Conclusions: We were able to demonstrate the practical application of the SMDI toolkit in a real-world setting. Application of the SMDI toolkit and the resulting insights of potential missingness patterns can inform the choice of appropriate analytic methods and increase transparency of research methods in handling missing data. This type of approach can inform analytic decision making and may increase our ability to generate evidence from real-world data.
期刊介绍:
BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.