Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.

IF 3.9 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Elinor Curnow, Rosie P Cornish, Jon E Heron, James R Carpenter, Kate Tilling
{"title":"Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.","authors":"Elinor Curnow, Rosie P Cornish, Jon E Heron, James R Carpenter, Kate Tilling","doi":"10.1186/s12874-024-02353-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Epidemiological and clinical studies often have missing data, frequently analysed using multiple imputation (MI). In general, MI estimates will be biased if data are missing not at random (MNAR). Bias due to data MNAR can be reduced by including other variables (\"auxiliary variables\") in imputation models, in addition to those required for the substantive analysis. Common advice is to take an inclusive approach to auxiliary variable selection (i.e. include all variables thought to be predictive of missingness and/or the missing values). There are no clear guidelines about the impact of this strategy when data may be MNAR.</p><p><strong>Methods: </strong>We explore the impact of including an auxiliary variable predictive of missingness but, in truth, unrelated to the partially observed variable, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of the additional bias of the MI estimator for the exposure coefficient (fitting either a linear or logistic regression model), when the (continuous or binary) partially observed variable is either the analysis outcome or the exposure. Here, \"additional bias\" refers to the difference in magnitude of the MI estimator when the imputation model includes (i) the auxiliary variable and the other analysis model variables; (ii) just the other analysis model variables, noting that both will be biased due to data MNAR. We illustrate the extent of this additional bias by re-analysing data from a birth cohort study.</p><p><strong>Results: </strong>The additional bias can be relatively large when the outcome is partially observed and missingness is caused by the outcome itself, and even larger if missingness is caused by both the outcome and the exposure (when either the outcome or exposure is partially observed).</p><p><strong>Conclusions: </strong>When using MI, the naïve and commonly used strategy of including all available auxiliary variables should be avoided. We recommend including the variables most predictive of the partially observed variable as auxiliary variables, where these can be identified through consideration of the plausible casual diagrams and missingness mechanisms, as well as data exploration (noting that associations with the partially observed variable in the complete records may be distorted due to selection bias).</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"231"},"PeriodicalIF":3.9000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11457445/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02353-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Epidemiological and clinical studies often have missing data, frequently analysed using multiple imputation (MI). In general, MI estimates will be biased if data are missing not at random (MNAR). Bias due to data MNAR can be reduced by including other variables ("auxiliary variables") in imputation models, in addition to those required for the substantive analysis. Common advice is to take an inclusive approach to auxiliary variable selection (i.e. include all variables thought to be predictive of missingness and/or the missing values). There are no clear guidelines about the impact of this strategy when data may be MNAR.

Methods: We explore the impact of including an auxiliary variable predictive of missingness but, in truth, unrelated to the partially observed variable, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of the additional bias of the MI estimator for the exposure coefficient (fitting either a linear or logistic regression model), when the (continuous or binary) partially observed variable is either the analysis outcome or the exposure. Here, "additional bias" refers to the difference in magnitude of the MI estimator when the imputation model includes (i) the auxiliary variable and the other analysis model variables; (ii) just the other analysis model variables, noting that both will be biased due to data MNAR. We illustrate the extent of this additional bias by re-analysing data from a birth cohort study.

Results: The additional bias can be relatively large when the outcome is partially observed and missingness is caused by the outcome itself, and even larger if missingness is caused by both the outcome and the exposure (when either the outcome or exposure is partially observed).

Conclusions: When using MI, the naïve and commonly used strategy of including all available auxiliary variables should be avoided. We recommend including the variables most predictive of the partially observed variable as auxiliary variables, where these can be identified through consideration of the plausible casual diagrams and missingness mechanisms, as well as data exploration (noting that associations with the partially observed variable in the complete records may be distorted due to selection bias).

使用仅预测缺失率的辅助估算变量进行多重估算,可能会因数据的非随机缺失而增加偏差。
背景流行病学和临床研究经常会有数据缺失的情况,通常采用多重估算(MI)进行分析。一般来说,如果数据非随机缺失(MNAR),MI 估计值就会出现偏差。除了实质性分析所需的变量外,还可以在估算模型中加入其他变量("辅助变量"),从而减少因数据 MNAR 而造成的偏差。常见的建议是在选择辅助变量时采用包容性方法(即包括所有被认为可预测缺失和/或缺失值的变量)。关于这一策略在数据可能为 MNAR 时的影响,目前还没有明确的指导原则:我们探讨了当数据为 MNAR 时,包含一个可预测缺失但实际上与部分观测变量无关的辅助变量的影响。当(连续或二元)部分观测变量为分析结果或暴露时,我们通过代数和模拟的方法量化了暴露系数 MI 估计器(拟合线性或逻辑回归模型)的额外偏差大小。这里的 "额外偏差 "指的是当估算模型包括(i)辅助变量和其他分析模型变量;(ii)仅包括其他分析模型变量时,MI 估算值的大小差异。我们通过重新分析一项出生队列研究的数据来说明这种额外偏差的程度:结果:当结果被部分观察到,而遗漏是由结果本身造成时,额外偏差可能相对较大;如果遗漏是由结果和暴露同时造成(当结果或暴露被部分观察到),额外偏差甚至更大:结论:在使用多元智能时,应避免采用天真且常用的策略,即纳入所有可用的辅助变量。我们建议将对部分观察变量最具预测性的变量作为辅助变量,这些变量可以通过考虑可信的偶然图和遗漏机制以及数据探索(注意完整记录中与部分观察变量的关联可能会因选择偏差而失真)来确定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Medical Research Methodology
BMC Medical Research Methodology 医学-卫生保健
CiteScore
6.50
自引率
2.50%
发文量
298
审稿时长
3-8 weeks
期刊介绍: BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信