{"title":"Sharp minimax distribution estimation for current status censoring with or without missing","authors":"S. Efromovich","doi":"10.1214/20-AOS1970","DOIUrl":null,"url":null,"abstract":"Nonparametric estimation of the cumulative distribution function and the probability density of a lifetime X modified by a current status censoring (CSC), including cases of right and left missing data, is a classical ill-posed problem with biased data. The biased nature of CSC data may preclude us from consistent estimation unless the biasing function is known or may be estimated, and its ill-posed nature slows down rates of convergence. Under a traditionally studied CSC, we observe a sample from $(Z,\\Delta )$ where a continuous monitoring time $Z$ is independent of $X$, $\\Delta :=I(X\\leq Z)$ is the status, and the bias of observations is created by the density of $Z$ which is estimable. In presence of right or left missing, we observe corresponding samples from $(\\Delta Z,\\Delta )$ or $((1-\\Delta )Z,\\Delta )$; the data are again biased but now the density of $Z$ cannot be estimated from the data. As a result, to solve the estimation problem, either the density of $Z$ must be known (like in a controlled study) or an extra cross-sectional sampling of $Z$, which is typically simpler than an underlying CSC study, be conducted. The main aim of the paper is to develop for this biased and ill-posed problem the theory of efficient (sharp-minimax) estimation which is inspired by known results for the case of directly observed $X$. Among interesting aspects of the developed theory: (i) While sharp-minimax analysis of missing CSC may follow the classical Pinsker’s methodology, analysis of CSC requires a more complicated estimation procedure based on a special smoothing in both frequency and time domains; (ii) Efficient estimation requires solving an old-standing problem of approximating aperiodic Sobolev functions; (iii) If smoothness of the cdf of $X$ is known, then its rate-minimax estimation is possible even if the density of $Z$ is rougher. Real and simulated examples, as well as extensions of the core models to dependent $X$ and Z and case-control CSC, are presented.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/20-AOS1970","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 1
Abstract
Nonparametric estimation of the cumulative distribution function and the probability density of a lifetime X modified by a current status censoring (CSC), including cases of right and left missing data, is a classical ill-posed problem with biased data. The biased nature of CSC data may preclude us from consistent estimation unless the biasing function is known or may be estimated, and its ill-posed nature slows down rates of convergence. Under a traditionally studied CSC, we observe a sample from $(Z,\Delta )$ where a continuous monitoring time $Z$ is independent of $X$, $\Delta :=I(X\leq Z)$ is the status, and the bias of observations is created by the density of $Z$ which is estimable. In presence of right or left missing, we observe corresponding samples from $(\Delta Z,\Delta )$ or $((1-\Delta )Z,\Delta )$; the data are again biased but now the density of $Z$ cannot be estimated from the data. As a result, to solve the estimation problem, either the density of $Z$ must be known (like in a controlled study) or an extra cross-sectional sampling of $Z$, which is typically simpler than an underlying CSC study, be conducted. The main aim of the paper is to develop for this biased and ill-posed problem the theory of efficient (sharp-minimax) estimation which is inspired by known results for the case of directly observed $X$. Among interesting aspects of the developed theory: (i) While sharp-minimax analysis of missing CSC may follow the classical Pinsker’s methodology, analysis of CSC requires a more complicated estimation procedure based on a special smoothing in both frequency and time domains; (ii) Efficient estimation requires solving an old-standing problem of approximating aperiodic Sobolev functions; (iii) If smoothness of the cdf of $X$ is known, then its rate-minimax estimation is possible even if the density of $Z$ is rougher. Real and simulated examples, as well as extensions of the core models to dependent $X$ and Z and case-control CSC, are presented.