Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias

Elinor Curnow, Kate Tilling, Jon E. Heron, Rosie P. Cornish, James R. Carpenter
{"title":"Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias","authors":"Elinor Curnow, Kate Tilling, Jon E. Heron, Rosie P. Cornish, James R. Carpenter","doi":"10.3389/fepid.2023.1237447","DOIUrl":null,"url":null,"abstract":"Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.","PeriodicalId":73083,"journal":{"name":"Frontiers in epidemiology","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fepid.2023.1237447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.
随机缺失情况下缺失数据的多次补全:在补全模型中加入对撞机作为辅助变量会产生偏差
流行病学研究往往有缺失的数据,这通常是由多重imputation (MI)处理。在人工智能中,除了实体分析所需的变量外,归算模型通常还包括其他变量(“辅助变量”)。预测部分观察到的变量的辅助变量可以减少MI估计器的标准误差(SE),如果它们还预测数据丢失的概率,则可以减少由于数据非随机丢失而导致的偏差。然而,辅助变量的选择缺乏指导。我们研究了一个选择不当的辅助变量的后果:如果它与部分观察到的变量有共同的原因,并且它缺失的概率(即,它是一个“碰撞器”),它的包含可以在MI估计器中引起偏差,并可能增加SE。我们量化,代数和模拟,偏差和SE的大小,当暴露或结果是不完整的。当实质性分析结果被部分观察到时,相对于暴露系数的大小,偏差可能是实质性的。在完整记录分析有效的情况下,当曝光被部分观察到时,偏差较小。然而,如果结果也导致暴露缺失,则偏差可能更大。在使用MI时,重要的是要通过结合数据探索和考虑合理的随机图和缺失机制来检查潜在的辅助变量是否为碰撞器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信