Choosing Imputation Models

IF 5.4 2区社会学 Q1 POLITICAL SCIENCE

Political Analysis Pub Date : 2021-07-12 DOI:10.1017/pan.2021.39

M. Marbach

引用次数: 0

Abstract

Abstract Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.

查看原文本刊更多论文

选择输入模型

摘要估算缺失值是数据分析中一个重要的预处理步骤，但文献很少对如何在估算模型之间进行选择提供指导。这封信建议采用插补模型，在平衡所有其他协变量后，生成与不完全变量观测值最相似的插补值密度。我们建议将稳定的平衡权重作为平衡协变量的一种实用方法，如果值不是完全随机丢失的，则协变量的分布预计会有所不同。平衡后，可以使用差异统计来比较估算值和观测值的密度。我们使用美国国家选举研究的模拟和真实世界调查数据，比较了流行的插补方法，包括随机森林、热甲板、预测均值匹配和多元正态插补，说明了建议方法的应用。本函附有一份实施建议方法的R文件包。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Political Analysis POLITICAL SCIENCE-

CiteScore

8.80

自引率

3.70%

发文量

期刊介绍： Political Analysis chronicles these exciting developments by publishing the most sophisticated scholarship in the field. It is the place to learn new methods, to find some of the best empirical scholarship, and to publish your best research.