Statistical SciencePub Date : 2024-11-01Epub Date: 2024-10-30DOI: 10.1214/24-sts936
Hani Doss, Antonio Linero
{"title":"Scalable Empirical Bayes Inference and Bayesian Sensitivity Analysis.","authors":"Hani Doss, Antonio Linero","doi":"10.1214/24-sts936","DOIUrl":"https://doi.org/10.1214/24-sts936","url":null,"abstract":"<p><p>Consider a Bayesian setup in which we observe <math><mi>Y</mi></math> , whose distribution depends on a parameter <math><mi>θ</mi></math> , that is, <math><mi>Y</mi> <mo>∣</mo> <mi>θ</mi> <mspace></mspace> <mo>~</mo> <mspace></mspace> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mo>∣</mo> <mi>θ</mi></mrow> </msub> </math> . The parameter <math><mi>θ</mi></math> is unknown and treated as random, and a prior distribution chosen from some parametric family <math> <mfenced> <mrow> <msub><mrow><mi>π</mi></mrow> <mrow><mi>θ</mi></mrow> </msub> <mo>(</mo> <mo>⋅</mo> <mo>;</mo> <mi>h</mi> <mo>)</mo> <mo>,</mo> <mi>h</mi> <mo>∈</mo> <mi>ℋ</mi></mrow> </mfenced> </math> , is to be placed on it. For the subjective Bayesian there is a single prior in the family which represents his or her beliefs about <math><mi>θ</mi></math> , but determination of this prior is very often extremely difficult. In the empirical Bayes approach, the latent distribution on <math><mi>θ</mi></math> is estimated from the data. This is usually done by choosing the value of the hyperparameter <math><mi>h</mi></math> that maximizes some criterion. Arguably the most common way of doing this is to let <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the marginal likelihood of <math><mi>h</mi></math> , that is, <math><mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo> <mo>=</mo> <mo>∫</mo> <msub><mrow><mi>π</mi></mrow> <mrow><mi>Y</mi> <mspace></mspace> <mo>∣</mo> <mspace></mspace> <mi>θ</mi></mrow> </msub> <msub><mrow><mi>v</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mo>(</mo> <mi>θ</mi> <mo>)</mo> <mspace></mspace> <mi>d</mi> <mi>θ</mi></math> , and choose the value of <math><mi>h</mi></math> that maximizes <math><mi>m</mi> <mo>(</mo> <mo>⋅</mo> <mo>)</mo></math> . Unfortunately, except for a handful of textbook examples, analytic evaluation of <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> is not feasible. The purpose of this paper is two-fold. First, we review the literature on estimating it and find that the most commonly used procedures are either potentially highly inaccurate or don't scale well with the dimension of <math><mi>h</mi></math> , the dimension of <math><mi>θ</mi></math> , or both. Second, we present a method for estimating <math><mi>a</mi> <mi>r</mi> <mi>g</mi> <mspace></mspace> <msub><mrow><mi>m</mi> <mi>a</mi> <mi>x</mi></mrow> <mrow><mi>h</mi></mrow> </msub> <mspace></mspace> <mi>m</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> , based on Markov chain Monte Carlo, that applies very generally and scales well with dimension. Let <math><mi>g</mi></math> be a real-valued function of <math><mi>θ</mi></math> , and let <math><mi>I</mi> <mo>(</mo> <mi>h</mi> <mo>)</mo></math> be the posterior expectation of <math><mi>g</mi> <mo>(</mo> <mi>θ</mi> <mo>)</mo></math> when the prior is <math> <msub><mrow><m","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 4","pages":"601-622"},"PeriodicalIF":3.9,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654829/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistical SciencePub Date : 2024-05-01Epub Date: 2024-05-05DOI: 10.1214/23-sts900
Chuji Luo, Michael J Daniels
{"title":"Variable Selection Using Bayesian Additive Regression Trees.","authors":"Chuji Luo, Michael J Daniels","doi":"10.1214/23-sts900","DOIUrl":"10.1214/23-sts900","url":null,"abstract":"<p><p>Variable selection is an important statistical problem. This problem becomes more challenging when the candidate predictors are of mixed type (e.g. continuous and binary) and impact the response variable in nonlinear and/or non-additive ways. In this paper, we review existing variable selection approaches for the Bayesian additive regression trees (BART) model, a nonparametric regression model, which is flexible enough to capture the interactions between predictors and nonlinear relationships with the response. An emphasis of this review is on the ability to identify relevant predictors. We also propose two variable importance measures which can be used in a permutation-based variable selection approach, and a backward variable selection procedure for BART. We introduce these variations as a way of illustrating limitations and opportunities for improving current approaches and assess these via simulations.</p>","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"39 2","pages":"286-304"},"PeriodicalIF":3.9,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11395240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carly Lupton Brantner, Ting-Hsuan Chang, Trang Quynh Nguyen, Hwanhee Hong, Leon Di Stefano, Elizabeth A. Stuart
{"title":"Methods for Integrating Trials and Non-experimental Data to Examine Treatment Effect Heterogeneity","authors":"Carly Lupton Brantner, Ting-Hsuan Chang, Trang Quynh Nguyen, Hwanhee Hong, Leon Di Stefano, Elizabeth A. Stuart","doi":"10.1214/23-sts890","DOIUrl":"https://doi.org/10.1214/23-sts890","url":null,"abstract":"Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"19 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alicia L Carriquiry, Michael J. Daniels, Nancy M Reid
{"title":"Editorial: Special Issue on Reproducibility and Replicability","authors":"Alicia L Carriquiry, Michael J. Daniels, Nancy M Reid","doi":"10.1214/23-sts909","DOIUrl":"https://doi.org/10.1214/23-sts909","url":null,"abstract":"","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139295518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Replication Success Under Questionable Research Practices—a Simulation Study","authors":"Francesca Freuli, Leonhard Held, Rachel Heyard","doi":"10.1214/23-sts904","DOIUrl":"https://doi.org/10.1214/23-sts904","url":null,"abstract":"Increasing evidence suggests that the reproducibility and replicability of scientific findings is threatened by researchers employing questionable research practices (QRPs) in order to achieve statistically significant results. Numerous metrics have been developed to determine replication success but it has not yet been investigated how well those metrics perform in the presence of QRPs. This paper aims to compare the performance of different metrics quantifying replication success in the presence of four types of QRPs: cherry picking of outcomes, questionable interim analyses, questionable inclusion of covariates, and questionable subgroup analyses. Our results show that the metric based on the version of the sceptical p-value that is recalibrated in terms of effect size performs better in maintaining low values of overall type-I error rate, but often requires larger replication sample sizes compared to metrics based on significance, the controlled version of the sceptical p-value, meta-analysis or Bayes factors, especially when severe QRPs are employed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, Glenn Shafer
{"title":"Game-Theoretic Statistics and Safe Anytime-Valid Inference","authors":"Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, Glenn Shafer","doi":"10.1214/23-sts894","DOIUrl":"https://doi.org/10.1214/23-sts894","url":null,"abstract":"Safe anytime-valid inference (SAVI) provides measures of statistical evidence and certainty—e-processes for testing and confidence sequences for estimation—that remain valid at all stopping times, accommodating continuous monitoring and analysis of accumulating data and optional stopping or continuation for any reason. These measures crucially rely on test martingales, which are nonnegative martingales starting at one. Since a test martingale is the wealth process of a player in a betting game, SAVI centrally employs game-theoretic intuition, language and mathematics. We summarize the SAVI goals and philosophy, and report recent advances in testing composite hypotheses and estimating functionals in nonparametric settings.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"105 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135514670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David S. Robertson, James M. S. Wason, Aaditya Ramdas
{"title":"Online Multiple Hypothesis Testing","authors":"David S. Robertson, James M. S. Wason, Aaditya Ramdas","doi":"10.1214/23-sts901","DOIUrl":"https://doi.org/10.1214/23-sts901","url":null,"abstract":"Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"1 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributionally Robust and Generalizable Inference","authors":"Dominik Rothenhäusler, Peter Bühlmann","doi":"10.1214/23-sts902","DOIUrl":"https://doi.org/10.1214/23-sts902","url":null,"abstract":"We discuss recently developed methods that quantify the stability and generalizability of statistical findings under distributional changes. In many practical problems, the data is not drawn i.i.d. from the target population. For example, unobserved sampling bias, batch effects, or unknown associations might inflate the variance compared to i.i.d. sampling. For reliable statistical inference, it is thus necessary to account for these types of variation. We discuss and review two methods that allow to quantify distribution stability based on a single dataset. The first method computes the sensitivity of a parameter under worst-case distributional perturbations to understand which types of shift pose a threat to external validity. The second method treats distributional shifts as random which allows to assess average robustness (instead of worst-case). Based on a stability analysis of multiple estimators on a single dataset, it integrates both sampling and distributional uncertainty into a single confidence interval.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"7 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135515521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Replicability Across Multiple Studies","authors":"Marina Bogomolov, Ruth Heller","doi":"10.1214/23-sts892","DOIUrl":"https://doi.org/10.1214/23-sts892","url":null,"abstract":"Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by signal in a single study, and thus nonreplicable. Although the great majority of meta-analyses carried out to date do not infer on the replicability of their findings, it is possible to do so. We provide a selective overview of analyses that can be carried out towards establishing replicability of the scientific findings. We describe methods for the setting where a single outcome is examined in multiple studies (as is common in systematic reviews of medical interventions), as well as for the setting where multiple studies each examine multiple features (as in genomics applications). We also discuss some of the current shortcomings and future directions.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135509930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining Replicability of Prediction Rules","authors":"Giovanni Parmigiani","doi":"10.1214/23-sts891","DOIUrl":"https://doi.org/10.1214/23-sts891","url":null,"abstract":"In this article, I propose an approach for defining replicability for prediction rules. Motivated by a recent report by the U.S.A. National Academy of Sciences, I start from the perspective that replicability is obtaining consistent results across studies suitable to address the same prediction question, each of which has obtained its own data. I then discuss concept and issues in defining key elements of this statement. I focus specifically on the meaning of “consistent results” in typical utilization contexts, and propose a multi-agent framework for defining replicability, in which agents are neither allied nor adversaries. I recover some of the prevalent practical approaches as special cases. I hope to provide guidance for a more systematic assessment of replicability in machine learning.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135509771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}