Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.
Emily Kawabata, Daniel Major-Smith, Gemma L Clayton, Chin Yang Shapland, Tim P Morris, Alice R Carter, Alba Fernández-Sanlés, Maria Carolina Borges, Kate Tilling, Gareth J Griffith, Louise A C Millard, George Davey Smith, Deborah A Lawlor, Rachael A Hughes
{"title":"Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study.","authors":"Emily Kawabata, Daniel Major-Smith, Gemma L Clayton, Chin Yang Shapland, Tim P Morris, Alice R Carter, Alba Fernández-Sanlés, Maria Carolina Borges, Kate Tilling, Gareth J Griffith, Louise A C Millard, George Davey Smith, Deborah A Lawlor, Rachael A Hughes","doi":"10.1186/s12874-024-02382-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Bias from data missing not at random (MNAR) is a persistent concern in health-related research. A bias analysis quantitatively assesses how conclusions change under different assumptions about missingness using bias parameters that govern the magnitude and direction of the bias. Probabilistic bias analysis specifies a prior distribution for these parameters, explicitly incorporating available information and uncertainty about their true values. A Bayesian bias analysis combines the prior distribution with the data's likelihood function whilst a Monte Carlo bias analysis samples the bias parameters directly from the prior distribution. No study has compared a Monte Carlo bias analysis to a Bayesian bias analysis in the context of MNAR missingness.</p><p><strong>Methods: </strong>We illustrate an accessible probabilistic bias analysis using the Monte Carlo bias analysis approach and a well-known imputation method. We designed a simulation study based on a motivating example from the UK Biobank study, where a large proportion of the outcome was missing and missingness was suspected to be MNAR. We compared the performance of our Monte Carlo bias analysis to a principled Bayesian bias analysis, complete case analysis (CCA) and multiple imputation (MI) assuming missing at random.</p><p><strong>Results: </strong>As expected, given the simulation study design, CCA and MI estimates were substantially biased, with 95% confidence interval coverages of 7-48%. Including auxiliary variables (i.e., variables not included in the substantive analysis that are predictive of missingness and the missing data) in MI's imputation model amplified the bias due to assuming missing at random. With reasonably accurate and precise information about the bias parameter, the Monte Carlo bias analysis performed as well as the Bayesian bias analysis. However, when very limited information was provided about the bias parameter, only the Bayesian bias analysis was able to eliminate most of the bias due to MNAR whilst the Monte Carlo bias analysis performed no better than the CCA and MI.</p><p><strong>Conclusion: </strong>The Monte Carlo bias analysis we describe is easy to implement in standard software and, in the setting we explored, is a viable alternative to a Bayesian bias analysis. We caution careful consideration of choice of auxiliary variables when applying imputation where data may be MNAR.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"278"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558901/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02382-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Bias from data missing not at random (MNAR) is a persistent concern in health-related research. A bias analysis quantitatively assesses how conclusions change under different assumptions about missingness using bias parameters that govern the magnitude and direction of the bias. Probabilistic bias analysis specifies a prior distribution for these parameters, explicitly incorporating available information and uncertainty about their true values. A Bayesian bias analysis combines the prior distribution with the data's likelihood function whilst a Monte Carlo bias analysis samples the bias parameters directly from the prior distribution. No study has compared a Monte Carlo bias analysis to a Bayesian bias analysis in the context of MNAR missingness.
Methods: We illustrate an accessible probabilistic bias analysis using the Monte Carlo bias analysis approach and a well-known imputation method. We designed a simulation study based on a motivating example from the UK Biobank study, where a large proportion of the outcome was missing and missingness was suspected to be MNAR. We compared the performance of our Monte Carlo bias analysis to a principled Bayesian bias analysis, complete case analysis (CCA) and multiple imputation (MI) assuming missing at random.
Results: As expected, given the simulation study design, CCA and MI estimates were substantially biased, with 95% confidence interval coverages of 7-48%. Including auxiliary variables (i.e., variables not included in the substantive analysis that are predictive of missingness and the missing data) in MI's imputation model amplified the bias due to assuming missing at random. With reasonably accurate and precise information about the bias parameter, the Monte Carlo bias analysis performed as well as the Bayesian bias analysis. However, when very limited information was provided about the bias parameter, only the Bayesian bias analysis was able to eliminate most of the bias due to MNAR whilst the Monte Carlo bias analysis performed no better than the CCA and MI.
Conclusion: The Monte Carlo bias analysis we describe is easy to implement in standard software and, in the setting we explored, is a viable alternative to a Bayesian bias analysis. We caution careful consideration of choice of auxiliary variables when applying imputation where data may be MNAR.
背景:非随机数据缺失(MNAR)造成的偏差是健康相关研究中一直存在的问题。偏差分析利用控制偏差大小和方向的偏差参数,定量评估在不同的缺失假设下,结论会发生怎样的变化。概率偏倚分析为这些参数指定了一个先验分布,明确纳入了关于其真实值的可用信息和不确定性。贝叶斯偏倚分析将先验分布与数据的似然函数相结合,而蒙特卡洛偏倚分析则直接从先验分布中抽取偏倚参数。目前还没有研究将蒙特卡罗偏倚分析与贝叶斯偏倚分析在 MNAR 缺失方面进行比较:我们利用蒙特卡洛偏倚分析方法和一种著名的估算方法,说明了一种可获得的概率偏倚分析。我们以英国生物库研究中的一个激励性实例为基础,设计了一项模拟研究,在该研究中,有很大一部分结果是缺失的,缺失被怀疑是 MNAR。我们将蒙特卡洛偏倚分析的性能与原则性贝叶斯偏倚分析、完整病例分析(CCA)和假设随机缺失的多重归因(MI)进行了比较:正如预期的那样,考虑到模拟研究设计,CCA 和 MI 估计值偏差很大,95% 置信区间覆盖率为 7-48%。在 MI 的估算模型中加入辅助变量(即未包含在实质性分析中,但可预测缺失和缺失数据的变量),扩大了假设随机缺失造成的偏差。在有相当准确和精确的偏差参数信息的情况下,蒙特卡罗偏差分析与贝叶斯偏差分析的效果一样好。然而,当提供的偏差参数信息非常有限时,只有贝叶斯偏差分析能够消除 MNAR 导致的大部分偏差,而蒙特卡罗偏差分析的表现并不比 CCA 和 MI 好:我们所描述的蒙特卡洛偏倚分析很容易在标准软件中实现,在我们所探索的环境中,它是贝叶斯偏倚分析的可行替代方案。我们提醒大家,在数据可能是 MNAR 的情况下应用归因时,要仔细考虑辅助变量的选择。
期刊介绍:
BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.