BiometrikaPub Date : 2022-09-28DOI: 10.1093/biomet/asac054
P. Rosenbaum, D. Rubin
{"title":"Propensity Scores in the Design of Observational Studies for Causal Effects","authors":"P. Rosenbaum, D. Rubin","doi":"10.1093/biomet/asac054","DOIUrl":"https://doi.org/10.1093/biomet/asac054","url":null,"abstract":"\u0000 The design of any study, whether experimental or observational, that is intended to estimate the causal effects of a treatment condition relative to a control condition, refers to those activities that precede any examination of outcome variables. As defined in our 1983 article (Rosenbaum & Rubin, 1983), the propensity score is the unit-level conditional probability of assignment to treatment versus control given the observed covariates; so, the propensity score explicitly does not involve any outcome variables, in contrast to other summaries of variables sometimes used in observational studies. Balancing the distributions of covariates in the treatment and control groups by matching or balancing on the propensity score is therefore an aspect of the design of the observational study. In this invited comment on our 1983 article, we review the situation in the early 1980’s, and we recall some apparent paradoxes that propensity scores helped to resolve. We demonstrate that it is possible to balance an enormous number of low-dimensional summaries of a high-dimensional covariate, even though it is generally impossible to match individuals closely for all of the components of a high-dimensional covariate. In a sense, there is only one crucial observed covariate, the propensity score, and there is one crucial unobserved covariate, the ‘principal unobserved covariate’. The propensity score and the principal unobserved covariate are equal when treatment assignment is strongly ignorable, that is, unconfounded. Controlling for observed covariates is a prelude to the crucial step from association to causation, the step that addresses potential biases from unmeasured covariates. The design of an observational study also prepares for the step to causation: by selecting comparisons to increase the design sensitivity, by seeking opportunities to detect bias, by seeking mutually supportive evidence affected by different biases, by incorporating quasi-experimental devices such as multiple control groups, and by including the economist’s instruments. All of these considerations reflect the formal development of sensitivity analyses that were largely informal prior to the 1980s.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47408529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-09-13DOI: 10.1093/biomet/asac043
A. Henzi, Johanna F. Ziegel
{"title":"Correction to: ‘Valid sequential inference on probability forecast performance’","authors":"A. Henzi, Johanna F. Ziegel","doi":"10.1093/biomet/asac043","DOIUrl":"https://doi.org/10.1093/biomet/asac043","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45482922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-09-01Epub Date: 2021-11-02DOI: 10.1093/biomet/asab055
S Li, M Sesia, Y Romano, E Candès, C Sabatti
{"title":"Searching for robust associations with a multi-environment knockoff filter.","authors":"S Li, M Sesia, Y Romano, E Candès, C Sabatti","doi":"10.1093/biomet/asab055","DOIUrl":"10.1093/biomet/asab055","url":null,"abstract":"<p><p>This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is widely applicable, this paper highlights its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"1 1","pages":"611-629"},"PeriodicalIF":2.4,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11022501/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"60702131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-09-01DOI: 10.1093/biomet/asab053
Lu Mao
{"title":"On the relative efficiency of the intent-to-treat Wilcoxon-Mann-Whitney test in the presence of noncompliance.","authors":"Lu Mao","doi":"10.1093/biomet/asab053","DOIUrl":"https://doi.org/10.1093/biomet/asab053","url":null,"abstract":"<p><p>A general framework is set up to study the asymptotic properties of the intent-to-treat Wilcoxon-Mann-Whitney test in randomized experiments with nonignorable noncompliance. Under location-shift alternatives, the Pitman efficiencies of the intent-to-treat Wilcoxon-Mann-Whitney and [Formula: see text] tests are derived. It is shown that the former is superior if the compliers are more likely to be found in high-density regions of the outcome distribution or, equivalently, if the noncompliers tend to reside in the tails. By logical extension, the relative efficiency of the two tests is sharply bounded by least and most favourable scenarios in which the compliers are segregated into regions of lowest and highest density, respectively. Such bounds can be derived analytically as a function of the compliance rate for common location families such as Gaussian, Laplace, logistic and [Formula: see text] distributions. These results can help empirical researchers choose the more efficient test for existing data, and calculate sample size for future trials in anticipation of noncompliance. Results for nonadditive alternatives and other tests follow along similar lines.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"109 3","pages":"873-880"},"PeriodicalIF":2.7,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9401868/pdf/asab053.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10487820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-08-24DOI: 10.1093/biomet/asad030
Yongchang Su, Xinran Li
{"title":"Treatment Effect Quantiles in Stratified Randomized Experiments and Matched Observational Studies","authors":"Yongchang Su, Xinran Li","doi":"10.1093/biomet/asad030","DOIUrl":"https://doi.org/10.1093/biomet/asad030","url":null,"abstract":"\u0000 Evaluating the treatment effect has become an important topic for many applications. However, most existing literature focuses mainly on average treatment effects. When the individual effects are heavy-tailed or have outlier values, not only may the average effect not be appropriate for summarizing treatment effects, but also the conventional inference for it can be sensitive and possibly invalid due to poor large-sample approximations. In this paper we focus on quantiles of individual treatment effects, which can be more robust in the presence of extreme individual effects. Moreover, our inference for them is purely randomization-based, avoiding any distributional assumptions on the units. We first consider inference in stratified randomized experiments, extending the recent work by? Caughey et al. (2021). We show that the computation of valid p-values for testing null hypotheses on quantiles of individual effects can be transformed into instances of the multiple-choice knapsack problem, which can be efficiently solved exactly or slightly conservatively. We then extend our approach to matched observational studies and propose a sensitivity analysis to investigate to what extent our inference on quantiles of individual effects is robust to unmeasured confounding. The proposed randomization inference and sensitivity analysis are simultaneously valid for all quantiles of individual effects, noting that the analysis for the maximum or minimum individual effect coincides with the conventional analysis assuming constant treatment effects.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48490717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-08-19eCollection Date: 2023-06-01DOI: 10.1093/biomet/asac050
Miao Yu, Wenbin Lu, Shu Yang, Pulak Ghosh
{"title":"A multiplicative structural nested mean model for zero-inflated outcomes.","authors":"Miao Yu, Wenbin Lu, Shu Yang, Pulak Ghosh","doi":"10.1093/biomet/asac050","DOIUrl":"10.1093/biomet/asac050","url":null,"abstract":"<p><p>Zero-inflated nonnegative outcomes are common in many applications. In this work, motivated by freemium mobile game data, we propose a class of multiplicative structural nested mean models for zero-inflated nonnegative outcomes which flexibly describes the joint effect of a sequence of treatments in the presence of time-varying confounders. The proposed estimator solves a doubly robust estimating equation, where the nuisance functions, namely the propensity score and conditional outcome means given confounders, are estimated parametrically or nonparametrically. To improve the accuracy, we leverage the characteristic of zero-inflated outcomes by estimating the conditional means in two parts, that is, separately modelling the probability of having positive outcomes given confounders, and the mean outcome conditional on its being positive and given the confounders. We show that the proposed estimator is consistent and asymptotically normal as either the sample size or the follow-up time goes to infinity. Moreover, the typical sandwich formula can be used to estimate the variance of treatment effect estimators consistently, without accounting for the variation due to estimating nuisance functions. Simulation studies and an application to a freemium mobile game dataset are presented to demonstrate the empirical performance of the proposed method and support our theoretical findings.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"110 2","pages":"519-536"},"PeriodicalIF":2.7,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10183836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9841636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-08-17DOI: 10.1093/biomet/asad026
Timo Dimitriadis, Tobias Fissler, Johanna F. Ziegel
{"title":"Characterizing M-estimators","authors":"Timo Dimitriadis, Tobias Fissler, Johanna F. Ziegel","doi":"10.1093/biomet/asad026","DOIUrl":"https://doi.org/10.1093/biomet/asad026","url":null,"abstract":"\u0000 We characterize the full classes of M-estimators for semiparametric models of general functionals by formally connecting the theory of consistent loss functions from forecast evaluation with the theory of M-estimation. This novel characterization result allows us to leverage existing results on loss functions known from the literature on forecast evaluation in estimation theory. We exemplify advantageous implications for the fields of robust, efficient, equivariant and Pareto-optimal M-estimation.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43790166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-08-13eCollection Date: 2023-06-01DOI: 10.1093/biomet/asac047
Hunyong Cho, Shannon T Holloway, David J Couper, Michael R Kosorok
{"title":"Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring.","authors":"Hunyong Cho, Shannon T Holloway, David J Couper, Michael R Kosorok","doi":"10.1093/biomet/asac047","DOIUrl":"10.1093/biomet/asac047","url":null,"abstract":"<p><p>We propose a reinforcement learning method for estimating an optimal dynamic treatment regime for survival outcomes with dependent censoring. The estimator allows the failure time to be conditionally independent of censoring and dependent on the treatment decision times, supports a flexible number of treatment arms and treatment stages, and can maximize either the mean survival time or the survival probability at a certain time-point. The estimator is constructed using generalized random survival forests and can have polynomial rates of convergence. Simulations and analysis of the Atherosclerosis Risk in Communities study data suggest that the new estimator brings higher expected outcomes than existing methods in various settings.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"110 2","pages":"395-410"},"PeriodicalIF":2.7,"publicationDate":"2022-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10183834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9841638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-08-10DOI: 10.1093/biomet/asac046
Zheng Zhou, Yongdao Zhou
{"title":"Optimal Row-Column Designs","authors":"Zheng Zhou, Yongdao Zhou","doi":"10.1093/biomet/asac046","DOIUrl":"https://doi.org/10.1093/biomet/asac046","url":null,"abstract":"\u0000 Row-column designs have been widely used in experiments involving double confounding. Among them, one that provides unconfounded estimation of all main effects and as many two-factor interactions as possible is preferred, and is called optimal. Most current work focuses on the construction of two-level row-column designs, while the corresponding optimality theory has been largely ignored. Moreover, most constructed designs contain at least one replicate of a full factorial design, which are not flexible as the number of factors increases. In this study, a theoretical framework is built up to evaluate the optimality of row-column designs with prime level. A method for constructing optimal row-column designs with prime level is proposed. Subsequently, optimal full factorial three-level row-column designs are constructed for any parameter combination. Optimal fractional factorial two-level and three-level row-column designs are also constructed for cost-saving.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47945467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BiometrikaPub Date : 2022-07-12eCollection Date: 2023-06-01DOI: 10.1093/biomet/asac041
Yixuan Qiu, Jing Lei, Kathryn Roeder
{"title":"Gradient-based sparse principal component analysis with extensions to online learning.","authors":"Yixuan Qiu, Jing Lei, Kathryn Roeder","doi":"10.1093/biomet/asac041","DOIUrl":"10.1093/biomet/asac041","url":null,"abstract":"<p><p>Sparse principal component analysis is an important technique for simultaneous dimensionality reduction and variable selection with high-dimensional data. In this work we combine the unique geometric structure of the sparse principal component analysis problem with recent advances in convex optimization to develop novel gradient-based sparse principal component analysis algorithms. These algorithms enjoy the same global convergence guarantee as the original alternating direction method of multipliers, and can be more efficiently implemented with the rich toolbox developed for gradient methods from the deep learning literature. Most notably, these gradient-based algorithms can be combined with stochastic gradient descent methods to produce efficient online sparse principal component analysis algorithms with provable numerical and statistical performance guarantees. The practical performance and usefulness of the new algorithms are demonstrated in various simulation studies. As an application, we show how the scalability and statistical accuracy of our method enable us to find interesting functional gene groups in high-dimensional RNA sequencing data.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"110 2","pages":"339-360"},"PeriodicalIF":2.7,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10183835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9841634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}