Second-Order Inference for the Mean of a Variable Missing at Random.

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics Pub Date : 2016-05-01 DOI:10.1515/ijb-2015-0031

Iván Díaz, Marco Carone, Mark J van der Laan

{"title":"Second-Order Inference for the Mean of a Variable Missing at Random.","authors":"Iván Díaz, Marco Carone, Mark J van der Laan","doi":"10.1515/ijb-2015-0031","DOIUrl":null,"url":null,"abstract":"<p><p>We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second-order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE always had a coverage probability equal or closer to the nominal value 0.95, compared to its first-order counterpart. In the best-case scenario, the proposed second-order TMLE had a coverage probability of 0.86 when the first-order TMLE had a coverage probability of zero. We also present a novel first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In the best-case scenario of our simulation study, the novel first-order TMLE improved the coverage probability from 0 to 0.90. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"12 1","pages":"333-49"},"PeriodicalIF":1.2000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0031","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2015-0031","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 12

Abstract

We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second-order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE always had a coverage probability equal or closer to the nominal value 0.95, compared to its first-order counterpart. In the best-case scenario, the proposed second-order TMLE had a coverage probability of 0.86 when the first-order TMLE had a coverage probability of zero. We also present a novel first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In the best-case scenario of our simulation study, the novel first-order TMLE improved the coverage probability from 0 to 0.90. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator.

查看原文本刊更多论文

随机缺失变量均值的二阶推断。

在随机缺失假设下，给出了缺失变量均值的二阶估计。除了标准双鲁棒方法采用的一阶展开式外，该估计方法还采用了参数泛函的近似二阶展开式，从而改进了现有方法。这导致关于建立一致性、局部效率和渐近线性所必需的收敛率的较弱假设。在目标最小损失估计(TMLE)框架下开发了通用估计策略。我们给出了一个模拟，比较了一阶估计器和二阶估计器对结果回归和缺失分数的初始估计器的收敛率的敏感性。在我们的模拟中，与一阶TMLE相比，二阶TMLE的覆盖概率总是等于或更接近标称值0.95。在最好的情况下，当一阶TMLE的覆盖概率为零时，建议的二阶TMLE的覆盖概率为0.86。我们还提出了一种新的一阶估计量，其灵感来自于参数泛函的二阶展开。该估计器只需要一维平滑，而二阶TMLE的实现通常需要在协变量空间上进行核平滑。与现有的一阶估计器相比，提出的一阶估计器有望改善有限样本性能。在我们的模拟研究的最佳情况下，新的一阶TMLE将覆盖概率从0提高到0.90。我们使用一个公开可用的数据集来说明我们的方法，以确定抗凝剂对经皮冠状动脉介入治疗患者健康结果的影响。我们提供了实现这个估算器的R代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Biostatistics MATHEMATICAL & COMPUTATIONAL BIOLOGY-STATISTICS & PROBABILITY

CiteScore

2.10

自引率

8.30%

发文量

审稿时长

>12 weeks

期刊介绍： The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.