BiometrikaPub Date : 2024-02-19DOI: 10.1093/biomet/asae010
Yajie Bao, Yuyang Huo, Haojie Ren, Changliang Zou
{"title":"Selective conformal inference with false coverage-statement rate control","authors":"Yajie Bao, Yuyang Huo, Haojie Ren, Changliang Zou","doi":"10.1093/biomet/asae010","DOIUrl":"https://doi.org/10.1093/biomet/asae010","url":null,"abstract":"Conformal inference is a popular tool for constructing prediction intervals. We consider here the scenario of post-selection/selective conformal inference, that is prediction intervals are reported only for individuals selected from unlabelled test data. To account for multiplicity, we develop a general split conformal framework to construct selective prediction intervals with the false coverage-statement rate control. We first investigate benjamini2005false's false coverage rate-adjusted method in the present setting, and show that it is able to achieve false coverage-statement rate control but yields uniformly inflated prediction intervals. We then propose a novel solution to the problem called selective conditional conformal prediction. Our method performs selection procedures on both the calibration set and test set, and then constructs conformal prediction intervals for the selected test candidates with the aid of conditional empirical distribution obtained by the post-selection calibration set. When the selection rule is exchangeable, we show that our proposed method can exactly control the false coverage-statement rate in a model-free and distribution-free guarantee. For non-exchangeable selection procedures involving the calibration set, we provide non-asymptotic bounds for the false coverage-statement rate under mild distributional assumptions. Numerical results confirm the effectiveness and robustness of our method in false coverage-statement rate control and show that it achieves more narrowed prediction intervals over existing methods across various settings.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"21 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139950365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-02-17DOI: 10.1093/biomet/asae008
Ying Zhou, Dingke Tang, Dehan Kong, Linbo Wang
{"title":"The Promises of Parallel Outcomes","authors":"Ying Zhou, Dingke Tang, Dehan Kong, Linbo Wang","doi":"10.1093/biomet/asae008","DOIUrl":"https://doi.org/10.1093/biomet/asae008","url":null,"abstract":"A key challenge in causal inference from observational studies is the identification and estimation of causal effects in the presence of unmeasured confounding. In this paper, we introduce a novel approach for causal inference that leverages information in multiple outcomes to deal with unmeasured confounding. An important assumption in our approach is conditional independence among multiple outcomes. In contrast to existing proposals in the literature, the roles of multiple outcomes in the conditional independence assumption are symmetric, hence the name parallel outcomes. We show nonparametric identifiability with at least three parallel outcomes and provide parametric estimation tools under a set of linear structural equation models. Our proposal is evaluated through a set of synthetic and real data analyses.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"191 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139924069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-02-11DOI: 10.1093/biomet/asae005
Yuyao Wang, Andrew Ying, Ronghui Xu
{"title":"Doubly robust estimation under covariate-induced dependent left truncation","authors":"Yuyao Wang, Andrew Ying, Ronghui Xu","doi":"10.1093/biomet/asae005","DOIUrl":"https://doi.org/10.1093/biomet/asae005","url":null,"abstract":"Summary In prevalent cohort studies with follow-up, the time-to-event outcome is subject to left truncation leading to selection bias. For estimation of the distribution of time-to-event, conventional methods adjusting for left truncation tend to rely on the quasi-independence assumption that the truncation time and the event time are independent on the observed region. This assumption is violated when there is dependence between the truncation time and the event time possibly induced by measured covariates. Inverse probability of truncation weighting can be used in this case, but it is sensitive to misspecification of the truncation model. In this work, we apply semiparametric theory to find the efficient influence curve of the expectation of an arbitrarily transformed survival time in the presence of covariate-induced dependent left truncation. We then use it to construct estimators that are shown to enjoy double-robustness properties. Our work represents the first attempt to construct doubly robust estimators in the presence of left truncation, which does not fall under the established framework of coarsened data where doubly robust approaches were developed. We provide technical conditions for the asymptotic properties that appear to not have been carefully examined in the literature for time-to-event data, and study the estimators via extensive simulation. We apply the estimators to two datasets from practice, with different right-censoring patterns.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"45 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139769979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-02-08DOI: 10.1093/biomet/asae006
Shuwei Li, Tao Hu, Lianming Wang, Christopher S McMahan, Joshua M Tebbs
{"title":"Regression analysis of group-tested current status data","authors":"Shuwei Li, Tao Hu, Lianming Wang, Christopher S McMahan, Joshua M Tebbs","doi":"10.1093/biomet/asae006","DOIUrl":"https://doi.org/10.1093/biomet/asae006","url":null,"abstract":"Summary Group testing is an effective way to reduce the time and cost associated with conducting large-scale screening for infectious diseases. Benefits are realized through testing pools formed by combining specimens, such as blood or urine, from different individuals. In some studies, individuals are assessed only once and a time-to-event endpoint is recorded, for example, the time until infection. Combining group testing with this type of endpoint results in group-tested current status data (?). To analyse these complex data, we propose methods which estimate a proportional hazards regression model based on test outcomes from measuring the pools. A sieve maximum likelihood estimation approach is developed that approximates the cumulative baseline hazard function with a piecewise constant function. To identify the sieve estimator, a computationally efficient expectation-maximization algorithm is derived by using data augmentation. Asymptotic properties of both the parametric and nonparametric components of the sieve estimator are then established by applying modern empirical process theory. Numerical results from simulation studies show that our proposed method performs nominally and has advantages over the corresponding estimation method based on individual testing results. We illustrate our work by analysing a chlamydia dataset collected by the State Hygienic Laboratory at the University of Iowa.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"25 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-02-08DOI: 10.1093/biomet/asae007
Karim M Abadir, Michel Lubrano
{"title":"Explicit solutions for the asymptotically-optimal bandwidth in cross-validation","authors":"Karim M Abadir, Michel Lubrano","doi":"10.1093/biomet/asae007","DOIUrl":"https://doi.org/10.1093/biomet/asae007","url":null,"abstract":"Summary We show that least squares cross-validation methods share a common structure which has an explicit asymptotic solution, when the chosen kernel is asymptotically separable in bandwidth and data. For density estimation with a multivariate Student t(ν) kernel, the cross-validation criterion becomes asymptotically equivalent to a polynomial of only three terms. Our bandwidth formulae are simple and noniterative thus leading to very fast computations, their integrated squared-error dominates traditional cross-validation implementations, they alleviate the notorious sample variability of cross-validation, and overcome its breakdown in the case of repeated observations. We illustrate our method with univariate and bivariate applications, of density estimation and nonparametric regressions, to a large dataset of Michigan State University academic wages and experience.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"3 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-02-04DOI: 10.1093/biomet/asae004
Zhexiao Lin, Fang Han
{"title":"On the failure of the bootstrap for Chatterjee's rank correlation","authors":"Zhexiao Lin, Fang Han","doi":"10.1093/biomet/asae004","DOIUrl":"https://doi.org/10.1093/biomet/asae004","url":null,"abstract":"Summary While researchers commonly use the bootstrap to quantify the uncertainty of an estimator, it has been noticed that the standard bootstrap, in general, does not work for Chatterjee's rank correlation. In this paper, we provide proof of this issue under an additional independence assumption, and complement our theory with simulation evidence for general settings. Chatterjee's rank correlation thus falls into a category of statistics that are asymptotically normal but bootstrap inconsistent. Valid inferential methods in this case are Chatterjee's original proposal for testing independence and Lin & Han (2022) 's analytic asymptotic variance estimator for more general purposes.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"125 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-02-03DOI: 10.1093/biomet/asae003
K W Chan, C Y Yau
{"title":"Asymptotically constant risk estimator of the time-average variance constant","authors":"K W Chan, C Y Yau","doi":"10.1093/biomet/asae003","DOIUrl":"https://doi.org/10.1093/biomet/asae003","url":null,"abstract":"Summary Estimation of the time-average variance constant is important for statistical analyses involving dependent data. This problem is difficult as it relies on a bandwidth parameter. Specifically, the optimal choices of the bandwidths of all existing estimators depend on the estimand itself and another unknown parameter which is very difficult to estimate. Thus, optimal variance estimation is unachievable. In this paper, we introduce a concept of converging flat-top kernels for constructing variance estimators whose optimal bandwidths are free of unknown parameters asymptotically and hence can be computed easily. We prove that the new estimator has an asymptotically constant risk and is locally asymptotically minimax.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"16 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139678925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-01-20DOI: 10.1093/biomet/asae001
D P Wiens
{"title":"A note on minimax robustness of designs against correlated or heteroscedastic responses","authors":"D P Wiens","doi":"10.1093/biomet/asae001","DOIUrl":"https://doi.org/10.1093/biomet/asae001","url":null,"abstract":"Summary We present a result according to which certain functions of covariance matrices are maximized at scalar multiples of the identity matrix. This is used to show that experimental designs that are optimal under an assumption of independent, homoscedastic responses can be minimax robust, in broad classes of alternate covariance structures. In particular it can justify the common practice of disregarding possible dependence, or heteroscedasticity, at the design stage of an experiment.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"47 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-01-17DOI: 10.1093/biomet/asae002
K Klockmann, T Krivobokova
{"title":"Efficient nonparametric estimation of Toeplitz covariance matrices","authors":"K Klockmann, T Krivobokova","doi":"10.1093/biomet/asae002","DOIUrl":"https://doi.org/10.1093/biomet/asae002","url":null,"abstract":"A new efficient nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the discrete cosine transform is proposed. The method is implemented in the R package vstdct that accompanies the paper.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2023-12-22DOI: 10.1093/biomet/asad078
Jelle J Goeman, Aldo Solari
{"title":"On Selecting and Conditioning in Multiple Testing and Selective Inference","authors":"Jelle J Goeman, Aldo Solari","doi":"10.1093/biomet/asad078","DOIUrl":"https://doi.org/10.1093/biomet/asad078","url":null,"abstract":"We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"94 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}