Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics Pub Date : 2017-04-20 DOI:10.1515/ijb-2016-0053

Yanmei Xie, Biao Zhang

{"title":"Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.","authors":"Yanmei Xie, Biao Zhang","doi":"10.1515/ijb-2016-0053","DOIUrl":null,"url":null,"abstract":"Abstract: Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719–30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"13 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2017-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2016-0053","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2016-0053","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 5

Abstract

Abstract: Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719–30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).

查看原文本刊更多论文

不可忽略协变量缺失数据问题的经验似然。

在回归分析中经常出现协变量数据缺失，这在卫生和社会科学以及调查抽样中经常出现。本文研究了假设条件平均函数中一些协变量完全可见而另一些协变量缺失的不可忽略协变量缺失数据问题的分析方法。我们采用了Bartlett等人的半参数视角(提高了协变量为MNAR时全案例分析的效率)。生物统计学2014;15:719-30)关于不可忽略缺失协变量的回归分析，其中他们介绍了两种工作模型的使用，缺失的工作概率模型和工作条件得分模型。本文研究了不可忽略的协变量缺失数据问题的经验似然方法，目的是有效地利用这两种工作模型来分析协变量缺失数据。我们提出了一种统一的方法来构造一个无偏估计方程系统，其中方程多于未知感兴趣的参数。这些无偏估计方程的一个有用的特征是，它们自然地将不完整的数据纳入数据分析，即使在工作回归函数没有指定为最优回归函数的情况下，也可以寻求对感兴趣的参数的有效估计。我们应用经验似然的一般方法来最优地组合这些无偏估计方程。我们提出了三个潜在回归参数的最大经验似然估计，并将它们的效率与其他现有竞争对手进行了比较。我们提出了一项模拟研究来比较各种方法的有限样本性能，包括偏差、效率和对模型错误规范的鲁棒性。对美国国家健康和营养检查调查(NHANES)数据集的分析也说明了所提出的经验似然方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Biostatistics MATHEMATICAL & COMPUTATIONAL BIOLOGY-STATISTICS & PROBABILITY

CiteScore

2.10

自引率

8.30%

发文量

审稿时长

>12 weeks

期刊介绍： The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.