BiometrikaPub Date : 2024-12-17eCollection Date: 2025-01-01DOI: 10.1093/biomet/asae069
Chenyin Gao, Shu Yang, Mingyang Shan, Wenyu Ye, Ilya Lipkovich, Douglas Faries
{"title":"Improving randomized controlled trial analysis via data-adaptive borrowing.","authors":"Chenyin Gao, Shu Yang, Mingyang Shan, Wenyu Ye, Ilya Lipkovich, Douglas Faries","doi":"10.1093/biomet/asae069","DOIUrl":"10.1093/biomet/asae069","url":null,"abstract":"<p><p>In recent years, real-world external controls have grown in popularity as a tool to empower randomized placebo-controlled trials, particularly in rare diseases or cases where balanced randomization is unethical or impractical. However, as external controls are not always comparable to the trials, direct borrowing without scrutiny may heavily bias the treatment effect estimator. Our paper proposes a data-adaptive integrative framework capable of preventing unknown biases of the external controls. The adaptive nature is achieved by dynamically sorting out a comparable subset of external controls via bias penalization. Our proposed method can simultaneously achieve (a) the semiparametric efficiency bound when the external controls are comparable and (b) selective borrowing that mitigates the impact of the existence of incomparable external controls. Furthermore, we establish statistical guarantees, including consistency, asymptotic distribution and inference, providing Type-I error control and good power. Extensive simulations and two real-data applications show that the proposed method leads to improved performance over the trial-only estimator across various bias-generating scenarios.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asae069"},"PeriodicalIF":2.4,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143794582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-12-01Epub Date: 2024-06-14DOI: 10.1093/biomet/asae029
Yichen Zhu, Michele Peruzzi, Cheng Li, David B Dunson
{"title":"Radial Neighbors for Provably Accurate Scalable Approximations of Gaussian Processes.","authors":"Yichen Zhu, Michele Peruzzi, Cheng Li, David B Dunson","doi":"10.1093/biomet/asae029","DOIUrl":"https://doi.org/10.1093/biomet/asae029","url":null,"abstract":"<p><p>In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable <math><mi>O</mi> <mo>(</mo> <mi>n</mi> <mo>)</mo></math> computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and sensitivity to graph choice. We address these issues by introducing radial neighbors Gaussian processes (RadGP), a class of Gaussian processes based on directed acyclic graphs in which directed edges connect every location to all of its neighbors within a predetermined radius. We prove that any radial neighbors Gaussian process can accurately approximate the corresponding unrestricted Gaussian process in Wasserstein-2 distance, with an error rate determined by the approximation radius, the spatial covariance function, and the spatial dispersion of samples. We offer further empirical validation of our approach via applications on simulated and real world data showing excellent performance in both prior and posterior approximations to the original Gaussian process.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"111 4","pages":"1151-1167"},"PeriodicalIF":2.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11993192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143967374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-11-22eCollection Date: 2025-01-01DOI: 10.1093/biomet/asae064
O Dukes, D B Richardson, Z Shahn, J M Robins, E J Tchetgen Tchetgen
{"title":"Using negative controls to identify causal effects with invalid instrumental variables.","authors":"O Dukes, D B Richardson, Z Shahn, J M Robins, E J Tchetgen Tchetgen","doi":"10.1093/biomet/asae064","DOIUrl":"10.1093/biomet/asae064","url":null,"abstract":"<p><p>Many proposals for the identification of causal effects require an instrumental variable that satisfies strong, untestable unconfoundedness and exclusion restriction assumptions. In this paper, we show how one can potentially identify causal effects under violations of these assumptions by harnessing a negative control population or outcome. This strategy allows one to leverage subpopulations for whom the exposure is degenerate, and requires that the instrument-outcome association satisfies a certain parallel trend condition. We develop semiparametric efficiency theory for a general instrumental variable model, and obtain a multiply robust, locally efficient estimator of the average treatment effect in the treated. The utility of the estimators is demonstrated in simulation studies and an analysis of the Life Span Study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 1","pages":"asae064"},"PeriodicalIF":2.4,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878522/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143566025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-11-04eCollection Date: 2025-01-01DOI: 10.1093/biomet/asae061
C J Wolock, P B Gilbert, N Simon, M Carone
{"title":"Assessing variable importance in survival analysis using machine learning.","authors":"C J Wolock, P B Gilbert, N Simon, M Carone","doi":"10.1093/biomet/asae061","DOIUrl":"10.1093/biomet/asae061","url":null,"abstract":"<p><p>Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of HIV acquisition over the intended follow-up period, and investigators may wish to understand how much certain types of predictors, such as behavioural factors, contribute to overall predictiveness. Time-to-event outcomes such as time to HIV acquisition are often subject to right censoring, and existing methods for assessing variable importance are typically not intended to be used in this setting. We describe a broad class of algorithm-agnostic variable importance measures for prediction in the context of survival data. We propose a nonparametric efficient estimation procedure that incorporates flexible learning of nuisance parameters, yields asymptotically valid inference and enjoys double robustness. We assess the performance of our proposed procedure via numerical simulations and analyse data from the HVTN 702 vaccine trial to inform enrolment strategies for future HIV vaccine trials.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asae061"},"PeriodicalIF":2.4,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11910984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-10-17eCollection Date: 2025-01-01DOI: 10.1093/biomet/asae055
Peijun Sang, Dehan Kong, Shu Yang
{"title":"Functional principal component analysis with informative observation times.","authors":"Peijun Sang, Dehan Kong, Shu Yang","doi":"10.1093/biomet/asae055","DOIUrl":"10.1093/biomet/asae055","url":null,"abstract":"<p><p>Functional principal component analysis has been shown to be invaluable for revealing variation modes of longitudinal outcomes, which serve as important building blocks for forecasting and model building. Decades of research have advanced methods for functional principal component analysis, often assuming independence between the observation times and longitudinal outcomes. Yet such assumptions are fragile in real-world settings where observation times may be driven by outcome-related processes. Rather than ignoring the informative observation time process, we explicitly model the observational times by a general counting process dependent on time-varying prognostic factors. Identification of the mean, covariance function and functional principal components ensues via inverse intensity weighting. We propose using weighted penalized splines for estimation and establish consistency and convergence rates for the weighted estimators. Simulation studies demonstrate that the proposed estimators are substantially more accurate than the existing ones in the presence of a correlation between the observation time process and the longitudinal outcome process. We further examine the finite-sample performance of the proposed method using the Acute Infection and Early Disease Research Program study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 1","pages":"asae055"},"PeriodicalIF":2.4,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771518/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143057923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-09-09DOI: 10.1093/biomet/asae046
Tianhai Zu, Yichen Qin
{"title":"Local Bootstrap for Network Data","authors":"Tianhai Zu, Yichen Qin","doi":"10.1093/biomet/asae046","DOIUrl":"https://doi.org/10.1093/biomet/asae046","url":null,"abstract":"SUMMARY In network analysis, we frequently need to conduct inference for network parameters based on one observed network. Since the sampling distribution of the statistic is often unknown, we need to rely on the bootstrap. However, due to the complex dependence structure among vertices, existing bootstrap methods often yield unsatisfactory performance, especially under small or moderate sample sizes. To this end, we propose a new network bootstrap procedure, termed local bootstrap, to estimate the standard errors of network statistics. We propose to resample the observed vertices along with their neighbor sets, and reconstruct the edges between the resampled vertices by drawing from the set of edges connecting their neighbor sets. We justify the proposed method theoretically with desirable asymptotic properties for statistics such as motif density, and demonstrate its excellent numerical performance in small and moderate sample sizes. Our method includes several existing methods, such as the empirical graphon bootstrap, as special cases. We investigate the advantages of the proposed methods over the existing methods through the lens of edge randomness, vertex heterogeneity, neighbor set size, which shed some light on the complex issue of network bootstrapping.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-08-26DOI: 10.1093/biomet/asae045
H Dette, M Kroll
{"title":"A Simple Bootstrap for Chatterjee's Rank Correlation","authors":"H Dette, M Kroll","doi":"10.1093/biomet/asae045","DOIUrl":"https://doi.org/10.1093/biomet/asae045","url":null,"abstract":"SUMMARY We prove that an m out of n bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that m out of n bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with dependent coordinates, and that it outperforms alternative estimation methods. Consistency of the bootstrap is proved in the Kolmogorov as well as in the Wasserstein distance.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-08-20DOI: 10.1093/biomet/asae044
Zhiqiang Tan
{"title":"Sensitivity models and bounds under sequential unmeasured confounding in longitudinal studies","authors":"Zhiqiang Tan","doi":"10.1093/biomet/asae044","DOIUrl":"https://doi.org/10.1093/biomet/asae044","url":null,"abstract":"Consider sensitivity analysis for causal inference in a longitudinal study with time-varying treatments and covariates. It is of interest to assess the worst-case possible values of counterfactual-outcome means and average treatment effects under sequential unmeasured confounding. We formulate several multi-period sensitivity models to relax the corresponding versions of the assumption of sequential non-confounding. The primary sensitivity model involves only counterfactual outcomes, whereas the joint and product sensitivity models involve both counterfactual covariates and outcomes. We establish and compare explicit representations for the sharp and conservative bounds at the population level through convex optimization, depending only on the observed data. These results provide for the first time a satisfactory generalization from the marginal sensitivity model in the cross-sectional setting.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-08-09DOI: 10.1093/biomet/asae038
J A Hanley
{"title":"Studies in the history of probability and statistics, LI: the first conditional logistic regression","authors":"J A Hanley","doi":"10.1093/biomet/asae038","DOIUrl":"https://doi.org/10.1093/biomet/asae038","url":null,"abstract":"Statisticians and epidemiologists generally cite the publications by Prentice & Breslow and by Breslow et al. in 1978 as the first description and use of conditional logistic regression, while economists cite the 1973 book chapter by Nobel laureate McFadden. We describe the until-now-unrecognized use of, and way of fitting, this model in 1934 by Lionel Penrose and Ronald Fisher.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2024-07-17DOI: 10.1093/biomet/asae036
Canhui Li, Donglin Zeng, Wensheng Zhu
{"title":"Robust Covariate-Balancing Method in Learning Optimal Individualized Treatment Regimes","authors":"Canhui Li, Donglin Zeng, Wensheng Zhu","doi":"10.1093/biomet/asae036","DOIUrl":"https://doi.org/10.1093/biomet/asae036","url":null,"abstract":"Summary One of the most important problems in precision medicine is to find the optimal individualized treatment rule, which is designed to recommend treatment decisions and maximize overall clinical benefit to patients based on their individual characteristics. Typically, the expected clinical outcome is required to be estimated first, in which an outcome regression model or a propensity score model usually needs to be assumed for most of the existing statistical methods. However, if either model assumption is invalid, the estimated treatment regime is not reliable. In this article, we first define a contrast value function, which is the basis of the study for individualized treatment regimes. Then we construct a hybrid estimator of the contrast value function, by combining two types of estimation methods. We further propose a robust covariate-balancing estimator of the contrast value function by combining the inverse probability weighted method and matching method, which is based on the covariate balancing propensity score proposed by Imai and Ratkovic (2014). Theoretical results show that the proposed estimator is doubly robust, that is, it is consistent if either the propensity score model or the matching is correct. Based on a large number of simulation studies, we demonstrate that the proposed estimator outperforms existing methods. Lastly, the proposed method is illustrated through analysis of the SUPPORT study.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"337 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}