{"title":"Effect of Model Space Priors on Statistical Inference with Model Uncertainty","authors":"Anupreet Porwal, A. Raftery","doi":"10.51387/22-nejsds14","DOIUrl":"https://doi.org/10.51387/22-nejsds14","url":null,"abstract":"Bayesian model averaging (BMA) provides a coherent way to account for model uncertainty in statistical inference tasks. BMA requires specification of model space priors and parameter space priors. In this article we focus on comparing different model space priors in the presence of model uncertainty. We consider eight reference model space priors used in the literature and three adaptive parameter priors recommended by Porwal and Raftery [37]. We assess the performance of these combinations of prior specifications for variable selection in linear regression models for the statistical tasks of parameter estimation, interval estimation, inference, point and interval prediction. We carry out an extensive simulation study based on 14 real datasets representing a range of situations encountered in practice. We found that beta-binomial model space priors specified in terms of the prior probability of model size performed best on average across various statistical tasks and datasets, outperforming priors that were uniform across models. Recently proposed complexity priors performed relatively poorly.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79901780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequentism","authors":"A. van der Vaart","doi":"10.51387/22-nejsds4a","DOIUrl":"https://doi.org/10.51387/22-nejsds4a","url":null,"abstract":"Discussion of “Four types of frequentism and their interplay with Bayesianism” by Jim Berger.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85232851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Bayesian Sequential Clinical Trial Designs","authors":"Tianjian Zhou, Yuan Ji","doi":"10.51387/23-NEJSDS24","DOIUrl":"https://doi.org/10.51387/23-NEJSDS24","url":null,"abstract":"Clinical trials usually involve sequential patient entry. When designing a clinical trial, it is often desirable to include a provision for interim analyses of accumulating data with the potential for stopping the trial early. We review Bayesian sequential clinical trial designs based on posterior probabilities, posterior predictive probabilities, and decision-theoretic frameworks. A pertinent question is whether Bayesian sequential designs need to be adjusted for the planning of interim analyses. We answer this question from three perspectives: a frequentist-oriented perspective, a calibrated Bayesian perspective, and a subjective Bayesian perspective. We also provide new insights into the likelihood principle, which is commonly tied to statistical inference and decision making in sequential clinical trials. Some theoretical results are derived, and numerical studies are conducted to illustrate and assess these designs.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73067114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Poisson Modeling and Predicting English Premier League Goal Scoring","authors":"Quang-Nguyen Nguyen","doi":"10.51387/21-NEJSDS1","DOIUrl":"https://doi.org/10.51387/21-NEJSDS1","url":null,"abstract":"The English Premier League is well-known for being not only one of the most popular professional sports leagues in the world, but also one of the toughest competitions to predict. The first purpose of this research was to verify the consistency between goal scoring in the English Premier League and the Poisson process; specifically, the relationships between the number of goals scored in a match and the Poisson distribution, the time between goals throughout the course of a season and the exponential distribution, and the time location of goals during football games and the continuous uniform distribution. We found that the Poisson process and the three probability distributions accurately describe Premier League goal scoring. In addition, Poisson regression was utilized to predict outcomes for a Premier League season, using different sets of season data and with a large number of simulations being involved. We examined and compared various soccer metrics from our simulation results, including an English club’s chances of being the champions, finishing in the top four and bottom three, and relegation points.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84322756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New Perspectives on Centering","authors":"Jack Prothero, Jan Hannig, J. Marron","doi":"10.51387/23-nejsds31","DOIUrl":"https://doi.org/10.51387/23-nejsds31","url":null,"abstract":"Data matrix centering is an ever-present yet under-examined aspect of data analysis. Functional data analysis (FDA) often operates with a default of centering such that the vectors in one dimension have mean zero. We find that centering along the other dimension identifies a novel useful mode of variation beyond those familiar in FDA. We explore ambiguities in both matrix orientation and nomenclature. Differences between centerings and their potential interaction can be easily misunderstood. We propose a unified framework and new terminology for centering operations. We clearly demonstrate the intuition behind and consequences of each centering choice with informative graphics. We also propose a new direction energy hypothesis test as part of a series of diagnostics for determining which choice of centering is best for a data set. We explore the application of these diagnostics in several FDA settings.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88240517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Compartment Model of Human Mobility and Early Covid-19 Dynamics in NYC","authors":"Ian Frankenburg, Sudipto Banerjee","doi":"10.51387/21-NEJSDS2","DOIUrl":"https://doi.org/10.51387/21-NEJSDS2","url":null,"abstract":"In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid-19 spread dynamics within New York City. To this end, we propose a multivariate compartmental model that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general Bayesian hierarchical model to provide uncertainty quantification of resulting estimates. The open-source probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and out-of-sample forecasting, we find our simple and interpretable model provides evidence that reductions in human mobility altered case dynamics.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81425031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison Between Bayesian and Frequentist Tail Probability Estimates","authors":"Nan Shen, B. Gonz'alez, L. Pericchi","doi":"10.51387/23-nejsds39","DOIUrl":"https://doi.org/10.51387/23-nejsds39","url":null,"abstract":"Tail probability plays an important part in the extreme value theory. Sometimes the conclusions from two approaches for estimating the tail probability of extreme events, the Bayesian and the frequentist methods, can differ a lot. In 1999, a rainfall that caused more than 30,000 deaths in Venezuela was not captured by the simple frequentist extreme value techniques. However, this catastrophic rainfall was not surprising if the Bayesian inference was used to allow for parameter uncertainty and the full available data was exploited [4].\u0000In this paper, we investigate the reasons that the Bayesian estimator of the tail probability is always higher than the frequentist estimator. Sufficient conditions for this phenomenon are established both by using Jensen’s Inequality and by looking at Taylor series approximations, both of which point to the convexity of the distribution function.","PeriodicalId":94360,"journal":{"name":"The New England Journal of Statistics in Data Science","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86050401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}