Double Sampling for Informatively Missing Data in Electronic Health Record-Based Comparative Effectiveness Research.

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine Pub Date : 2024-12-30 Epub Date: 2024-12-05 DOI:10.1002/sim.10298

Alexander W Levis, Rajarshi Mukherjee, Rui Wang, Heidi Fischer, Sebastien Haneuse

{"title":"Double Sampling for Informatively Missing Data in Electronic Health Record-Based Comparative Effectiveness Research.","authors":"Alexander W Levis, Rajarshi Mukherjee, Rui Wang, Heidi Fischer, Sebastien Haneuse","doi":"10.1002/sim.10298","DOIUrl":null,"url":null,"abstract":"<p><p>Missing data arise in most applied settings and are ubiquitous in electronic health records (EHR). When data are missing not at random (MNAR) with respect to measured covariates, sensitivity analyses are often considered. These solutions, however, are often unsatisfying in that they are not guaranteed to yield actionable conclusions. Motivated by an EHR-based study of long-term outcomes following bariatric surgery, we consider the use of double sampling as a means to mitigate MNAR outcome data when the statistical goals are estimation and inference regarding causal effects. We describe assumptions that are sufficient for the identification of the joint distribution of confounders, treatment, and outcome under this design. Additionally, we derive efficient and robust estimators of the average causal treatment effect under a nonparametric model and under a model assuming outcomes were, in fact, initially missing at random (MAR). We compare these in simulations to an approach that adaptively estimates based on evidence of violation of the MAR assumption. Finally, we also show that the proposed double sampling design can be extended to handle arbitrary coarsening mechanisms, and derive nonparametric efficient estimators of any smooth full data functional.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"6086-6098"},"PeriodicalIF":1.8000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639654/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.10298","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/5 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Missing data arise in most applied settings and are ubiquitous in electronic health records (EHR). When data are missing not at random (MNAR) with respect to measured covariates, sensitivity analyses are often considered. These solutions, however, are often unsatisfying in that they are not guaranteed to yield actionable conclusions. Motivated by an EHR-based study of long-term outcomes following bariatric surgery, we consider the use of double sampling as a means to mitigate MNAR outcome data when the statistical goals are estimation and inference regarding causal effects. We describe assumptions that are sufficient for the identification of the joint distribution of confounders, treatment, and outcome under this design. Additionally, we derive efficient and robust estimators of the average causal treatment effect under a nonparametric model and under a model assuming outcomes were, in fact, initially missing at random (MAR). We compare these in simulations to an approach that adaptively estimates based on evidence of violation of the MAR assumption. Finally, we also show that the proposed double sampling design can be extended to handle arbitrary coarsening mechanisms, and derive nonparametric efficient estimators of any smooth full data functional.

查看原文本刊更多论文

基于电子健康记录的比较有效性研究中信息缺失数据的双重抽样。

在大多数应用环境中都会出现数据缺失，并且在电子健康记录（EHR）中普遍存在。当数据相对于测量的协变量是非随机缺失（MNAR）时，通常考虑敏感性分析。然而，这些解决方案往往不令人满意，因为它们不能保证产生可操作的结论。在一项基于ehr的减肥手术后长期结果研究的激励下，当统计目标是对因果效应的估计和推断时，我们考虑使用双重抽样作为减轻MNAR结果数据的手段。我们描述了在这种设计下足以识别混杂因素、治疗和结果的联合分布的假设。此外，我们在非参数模型和假设结果实际上最初随机缺失（MAR）的模型下推导出平均因果处理效应的有效和稳健估计。我们将这些模拟与基于违反MAR假设的证据自适应估计的方法进行比较。最后，我们还证明了所提出的双采样设计可以扩展到处理任意粗化机制，并推导出任意光滑全数据泛函的非参数有效估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生

CiteScore

3.40

自引率

10.00%

发文量

334

审稿时长

2-4 weeks

期刊介绍： The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.