{"title":"Extrapolation before imputation reduces bias when imputing censored covariates.","authors":"Sarah C Lotspeich, Tanya P Garcia","doi":"10.1080/10618600.2024.2444323","DOIUrl":null,"url":null,"abstract":"<p><p>Modeling symptom progression to identify ideal subjects for a Huntington's disease clinical trial is problematic since time to diagnosis, a key covariate, can be heavily censored. Imputation is an appealing strategy that replaces the censored covariate with its conditional mean, but existing methods saw over 200% bias under heavy censoring. Calculating conditional means well requires estimating and then integrating over the survival function of the censored covariate from the censored value to infinity. To estimate the survival function flexibly, existing methods use the semiparametric Cox model with Breslow's estimator, leaving the integrand for the conditional means (the survival function) undefined beyond the observed data. The integral is then estimated up to the largest observed covariate value, and this approximation can cut off the tail of the survival function and lead to severe bias. We combine the semiparametric survival estimator with a parametric extension to approximate the integral up to infinity. In simulations, our proposed extrapolation-before-imputation approach substantially reduces the bias seen with existing imputation methods, sometimes even when the parametric extension was misspecified. We further demonstrate how imputing with corrected conditional means can prioritize subjects for clinical trials. The R code to reproduce results is available in the Supplementary Material.</p>","PeriodicalId":15422,"journal":{"name":"Journal of Computational and Graphical Statistics","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12435536/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational and Graphical Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/10618600.2024.2444323","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Modeling symptom progression to identify ideal subjects for a Huntington's disease clinical trial is problematic since time to diagnosis, a key covariate, can be heavily censored. Imputation is an appealing strategy that replaces the censored covariate with its conditional mean, but existing methods saw over 200% bias under heavy censoring. Calculating conditional means well requires estimating and then integrating over the survival function of the censored covariate from the censored value to infinity. To estimate the survival function flexibly, existing methods use the semiparametric Cox model with Breslow's estimator, leaving the integrand for the conditional means (the survival function) undefined beyond the observed data. The integral is then estimated up to the largest observed covariate value, and this approximation can cut off the tail of the survival function and lead to severe bias. We combine the semiparametric survival estimator with a parametric extension to approximate the integral up to infinity. In simulations, our proposed extrapolation-before-imputation approach substantially reduces the bias seen with existing imputation methods, sometimes even when the parametric extension was misspecified. We further demonstrate how imputing with corrected conditional means can prioritize subjects for clinical trials. The R code to reproduce results is available in the Supplementary Material.
期刊介绍:
The Journal of Computational and Graphical Statistics (JCGS) presents the very latest techniques on improving and extending the use of computational and graphical methods in statistics and data analysis. Established in 1992, this journal contains cutting-edge research, data, surveys, and more on numerical graphical displays and methods, and perception. Articles are written for readers who have a strong background in statistics but are not necessarily experts in computing. Published in March, June, September, and December.