BiometrikaPub Date : 2024-11-22eCollection Date: 2025-01-01DOI: 10.1093/biomet/asae064
O Dukes, D B Richardson, Z Shahn, J M Robins, E J Tchetgen Tchetgen
{"title":"Using negative controls to identify causal effects with invalid instrumental variables.","authors":"O Dukes, D B Richardson, Z Shahn, J M Robins, E J Tchetgen Tchetgen","doi":"10.1093/biomet/asae064","DOIUrl":"10.1093/biomet/asae064","url":null,"abstract":"<p><p>Many proposals for the identification of causal effects require an instrumental variable that satisfies strong, untestable unconfoundedness and exclusion restriction assumptions. In this paper, we show how one can potentially identify causal effects under violations of these assumptions by harnessing a negative control population or outcome. This strategy allows one to leverage subpopulations for whom the exposure is degenerate, and requires that the instrument-outcome association satisfies a certain parallel trend condition. We develop semiparametric efficiency theory for a general instrumental variable model, and obtain a multiply robust, locally efficient estimator of the average treatment effect in the treated. The utility of the estimators is demonstrated in simulation studies and an analysis of the Life Span Study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 1","pages":"asae064"},"PeriodicalIF":2.4,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878522/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143566025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-11-04eCollection Date: 2025-01-01DOI: 10.1093/biomet/asae061
C J Wolock, P B Gilbert, N Simon, M Carone
{"title":"Assessing variable importance in survival analysis using machine learning.","authors":"C J Wolock, P B Gilbert, N Simon, M Carone","doi":"10.1093/biomet/asae061","DOIUrl":"10.1093/biomet/asae061","url":null,"abstract":"<p><p>Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of HIV acquisition over the intended follow-up period, and investigators may wish to understand how much certain types of predictors, such as behavioural factors, contribute to overall predictiveness. Time-to-event outcomes such as time to HIV acquisition are often subject to right censoring, and existing methods for assessing variable importance are typically not intended to be used in this setting. We describe a broad class of algorithm-agnostic variable importance measures for prediction in the context of survival data. We propose a nonparametric efficient estimation procedure that incorporates flexible learning of nuisance parameters, yields asymptotically valid inference and enjoys double robustness. We assess the performance of our proposed procedure via numerical simulations and analyse data from the HVTN 702 vaccine trial to inform enrolment strategies for future HIV vaccine trials.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asae061"},"PeriodicalIF":2.4,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11910984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-10-17eCollection Date: 2025-01-01DOI: 10.1093/biomet/asae055
Peijun Sang, Dehan Kong, Shu Yang
{"title":"Functional principal component analysis with informative observation times.","authors":"Peijun Sang, Dehan Kong, Shu Yang","doi":"10.1093/biomet/asae055","DOIUrl":"10.1093/biomet/asae055","url":null,"abstract":"<p><p>Functional principal component analysis has been shown to be invaluable for revealing variation modes of longitudinal outcomes, which serve as important building blocks for forecasting and model building. Decades of research have advanced methods for functional principal component analysis, often assuming independence between the observation times and longitudinal outcomes. Yet such assumptions are fragile in real-world settings where observation times may be driven by outcome-related processes. Rather than ignoring the informative observation time process, we explicitly model the observational times by a general counting process dependent on time-varying prognostic factors. Identification of the mean, covariance function and functional principal components ensues via inverse intensity weighting. We propose using weighted penalized splines for estimation and establish consistency and convergence rates for the weighted estimators. Simulation studies demonstrate that the proposed estimators are substantially more accurate than the existing ones in the presence of a correlation between the observation time process and the longitudinal outcome process. We further examine the finite-sample performance of the proposed method using the Acute Infection and Early Disease Research Program study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 1","pages":"asae055"},"PeriodicalIF":2.4,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771518/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143057923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-09-09DOI: 10.1093/biomet/asae046
Tianhai Zu, Yichen Qin
{"title":"Local Bootstrap for Network Data","authors":"Tianhai Zu, Yichen Qin","doi":"10.1093/biomet/asae046","DOIUrl":"https://doi.org/10.1093/biomet/asae046","url":null,"abstract":"SUMMARY In network analysis, we frequently need to conduct inference for network parameters based on one observed network. Since the sampling distribution of the statistic is often unknown, we need to rely on the bootstrap. However, due to the complex dependence structure among vertices, existing bootstrap methods often yield unsatisfactory performance, especially under small or moderate sample sizes. To this end, we propose a new network bootstrap procedure, termed local bootstrap, to estimate the standard errors of network statistics. We propose to resample the observed vertices along with their neighbor sets, and reconstruct the edges between the resampled vertices by drawing from the set of edges connecting their neighbor sets. We justify the proposed method theoretically with desirable asymptotic properties for statistics such as motif density, and demonstrate its excellent numerical performance in small and moderate sample sizes. Our method includes several existing methods, such as the empirical graphon bootstrap, as special cases. We investigate the advantages of the proposed methods over the existing methods through the lens of edge randomness, vertex heterogeneity, neighbor set size, which shed some light on the complex issue of network bootstrapping.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-08-26DOI: 10.1093/biomet/asae045
H Dette, M Kroll
{"title":"A Simple Bootstrap for Chatterjee's Rank Correlation","authors":"H Dette, M Kroll","doi":"10.1093/biomet/asae045","DOIUrl":"https://doi.org/10.1093/biomet/asae045","url":null,"abstract":"SUMMARY We prove that an m out of n bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that m out of n bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with dependent coordinates, and that it outperforms alternative estimation methods. Consistency of the bootstrap is proved in the Kolmogorov as well as in the Wasserstein distance.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-08-20DOI: 10.1093/biomet/asae044
Zhiqiang Tan
{"title":"Sensitivity models and bounds under sequential unmeasured confounding in longitudinal studies","authors":"Zhiqiang Tan","doi":"10.1093/biomet/asae044","DOIUrl":"https://doi.org/10.1093/biomet/asae044","url":null,"abstract":"Consider sensitivity analysis for causal inference in a longitudinal study with time-varying treatments and covariates. It is of interest to assess the worst-case possible values of counterfactual-outcome means and average treatment effects under sequential unmeasured confounding. We formulate several multi-period sensitivity models to relax the corresponding versions of the assumption of sequential non-confounding. The primary sensitivity model involves only counterfactual outcomes, whereas the joint and product sensitivity models involve both counterfactual covariates and outcomes. We establish and compare explicit representations for the sharp and conservative bounds at the population level through convex optimization, depending only on the observed data. These results provide for the first time a satisfactory generalization from the marginal sensitivity model in the cross-sectional setting.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-08-09DOI: 10.1093/biomet/asae038
J A Hanley
{"title":"Studies in the history of probability and statistics, LI: the first conditional logistic regression","authors":"J A Hanley","doi":"10.1093/biomet/asae038","DOIUrl":"https://doi.org/10.1093/biomet/asae038","url":null,"abstract":"Statisticians and epidemiologists generally cite the publications by Prentice & Breslow and by Breslow et al. in 1978 as the first description and use of conditional logistic regression, while economists cite the 1973 book chapter by Nobel laureate McFadden. We describe the until-now-unrecognized use of, and way of fitting, this model in 1934 by Lionel Penrose and Ronald Fisher.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-07-17DOI: 10.1093/biomet/asae036
Canhui Li, Donglin Zeng, Wensheng Zhu
{"title":"Robust Covariate-Balancing Method in Learning Optimal Individualized Treatment Regimes","authors":"Canhui Li, Donglin Zeng, Wensheng Zhu","doi":"10.1093/biomet/asae036","DOIUrl":"https://doi.org/10.1093/biomet/asae036","url":null,"abstract":"Summary One of the most important problems in precision medicine is to find the optimal individualized treatment rule, which is designed to recommend treatment decisions and maximize overall clinical benefit to patients based on their individual characteristics. Typically, the expected clinical outcome is required to be estimated first, in which an outcome regression model or a propensity score model usually needs to be assumed for most of the existing statistical methods. However, if either model assumption is invalid, the estimated treatment regime is not reliable. In this article, we first define a contrast value function, which is the basis of the study for individualized treatment regimes. Then we construct a hybrid estimator of the contrast value function, by combining two types of estimation methods. We further propose a robust covariate-balancing estimator of the contrast value function by combining the inverse probability weighted method and matching method, which is based on the covariate balancing propensity score proposed by Imai and Ratkovic (2014). Theoretical results show that the proposed estimator is doubly robust, that is, it is consistent if either the propensity score model or the matching is correct. Based on a large number of simulation studies, we demonstrate that the proposed estimator outperforms existing methods. Lastly, the proposed method is illustrated through analysis of the SUPPORT study.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"337 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-07-13DOI: 10.1093/biomet/asae037
AmirEmad Ghassami, Alan Yang, Ilya Shpitser, Eric Tchetgen Tchetgen
{"title":"Causal inference with hidden mediators","authors":"AmirEmad Ghassami, Alan Yang, Ilya Shpitser, Eric Tchetgen Tchetgen","doi":"10.1093/biomet/asae037","DOIUrl":"https://doi.org/10.1093/biomet/asae037","url":null,"abstract":"Summary Proximal causal inference was recently proposed as a framework to identify causal effects from observational data in the presence of hidden confounders for which proxies are available. In this paper, we extend the proximal causal inference approach to settings where identification of causal effects hinges upon a set of mediators which are not observed, yet error prone proxies of the hidden mediators are measured. Specifically, (i) we establish causal hidden mediation analysis, which extends classical causal mediation analysis methods for identifying natural direct and indirect effects under no unmeasured confounding to a setting where the mediator of interest is hidden, but proxies of it are available. (ii) We establish a hidden front-door criterion, which extends the classical front-door criterion to allow for hidden mediators for which proxies are available. (iii) We show that the identification of a certain causal effect called population intervention indirect effect remains possible with hidden mediators in settings where challenges in (i) and (ii) might co-exist. We view (i)-(iii) as important steps towards the practical application of front-door criteria and mediation analysis as mediators are almost always measured with error and thus, the most one can hope for in practice is that the measurements are at best proxies of mediating mechanisms. We propose identification approaches for the parameters of interest in our considered models. For the estimation aspect, we propose an influence function-based estimation method and provide an analysis for the robustness of the estimators.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"249 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-07-10DOI: 10.1093/biomet/asae031
Nick W Koning
{"title":"More Power by Using Fewer Permutations","authors":"Nick W Koning","doi":"10.1093/biomet/asae031","DOIUrl":"https://doi.org/10.1093/biomet/asae031","url":null,"abstract":"Summary It is conventionally believed that permutation-based testing methods should ideally use all permutations. We challenge this by showing we can sometimes obtain dramatically more power by using a tiny subgroup. As the subgroup is tiny, this also comes at a much lower computational cost. Moreover, the method remains valid for the same hypotheses. We exploit this to improve the popular permutation-based Westfall & Young MaxT multiple testing method. We analyze the relative efficiency in a Gaussian location model, and find the largest gain in high dimensions.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"377 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141585906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}