{"title":"A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning","authors":"M. Tec, Yunshan Duan, P. Müller","doi":"10.1080/00031305.2022.2129787","DOIUrl":"https://doi.org/10.1080/00031305.2022.2129787","url":null,"abstract":"Abstract Reinforcement learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123208276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comment on “On Optimal Correlation-Based Prediction,” by Bottai et al. (2022)","authors":"R. Christensen","doi":"10.1080/00031305.2022.2055644","DOIUrl":"https://doi.org/10.1080/00031305.2022.2055644","url":null,"abstract":"Bottai et al. (2022) examine the best predictors that maximize two correlation criteria and in particular examine predictors that are restricted to have the same mean and variance as what they are trying to predict. We give a brief demonstration that their best correlation predictor, subject to the mean and variance conditions, also minimizes the expected squared error prediction loss subject to those constraints on the predictors. , , , to predict y from the values of x 1 , . . . , x p 1 . the vector x x 1 , . . . , x p 1 ) (cid:2) A reasonable criterion for choosing a predictor of y to pick a predictor h ( x ) that minimizes the mean squared E [ y − h ( x ) ] 2 . The expected value is taken over the joint distribution of y and x . It is well known that the best predictor is (essentially), using notation from both Christensen (2020, sec. 6.3) and Bottai et al. (2022),","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128310276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matteo Bottai, Taeho Kim, Benjamin Lieberman, G. Luta, Edsel A. Peña
{"title":"On Optimal Correlation-Based Prediction","authors":"Matteo Bottai, Taeho Kim, Benjamin Lieberman, G. Luta, Edsel A. Peña","doi":"10.1080/00031305.2022.2051604","DOIUrl":"https://doi.org/10.1080/00031305.2022.2051604","url":null,"abstract":"Abstract This note examines, at the population-level, the approach of obtaining predictors of a random variable Y, given the joint distribution of , by maximizing the mapping for a given correlation function . Commencing with Pearson’s correlation function, the class of such predictors is uncountably infinite. The least-squares predictor is an element of this class obtained by equating the expectations of Y and to be equal and the variances of and to be also equal. On the other hand, replacing the second condition by the equality of the variances of Y and , a natural requirement for some calibration problems, the unique predictor that is obtained has the maximum value of Lin’s (1989) concordance correlation coefficient (CCC) with Y among all predictors. Since the CCC measures the degree of agreement, the new predictor is called the maximal agreement predictor. These predictors are illustrated for three special distributions: the multivariate normal distribution; the exponential distribution, conditional on covariates; and the Dirichlet distribution. The exponential distribution is relevant in survival analysis or in reliability settings, while the Dirichlet distribution is relevant for compositional data.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130796088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juxin Liu, Annshirley Afful, H. Mansell, Yanyuan Ma
{"title":"Bias Analysis for Misclassification Errors in both the Response Variable and Covariate","authors":"Juxin Liu, Annshirley Afful, H. Mansell, Yanyuan Ma","doi":"10.1080/00031305.2022.2066725","DOIUrl":"https://doi.org/10.1080/00031305.2022.2066725","url":null,"abstract":"Abstract– Much literature has focused on statistical inference for misclassified response variables or misclassified covariates. However, misclassification in both the response variable and the covariate has received very limited attention within applied fields and the statistics community. In situations where the response variable and the covariate are simultaneously subject to misclassification errors, an assumption of independent misclassification errors is often used for convenience without justification. This article aims to show the harmful consequences of inappropriate adjustment for joint misclassification errors. In particular, we focus on the wrong adjustment by ignoring the dependence between the misclassification process of the response variable and the covariate. In this article, the dependence of misclassification in both variables is characterized by covariance-type parameters. We extend the original definition of dependence parameters to a more general setting. We discover a single quantity that governs the dependence of the two misclassification processes. Moreover, we propose likelihood ratio tests to check the nondifferential/independent misclassification assumption in main study/internal validation study designs. Our simulation studies indicate that ignoring the dependent error structure can be even worse than ignoring all the misclassification errors when the validation data size is relatively small. The methodology is illustrated by a real data example.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115801708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayes Factors and Posterior Estimation: Two Sides of the Very Same Coin","authors":"Harlan Campbell, P. Gustafson","doi":"10.1080/00031305.2022.2139293","DOIUrl":"https://doi.org/10.1080/00031305.2022.2139293","url":null,"abstract":"Abstract Recently, several researchers have claimed that conclusions obtained from a Bayes factor (or the posterior odds) may contradict those obtained from Bayesian posterior estimation. In this article, we wish to point out that no such “contradiction” exists if one is willing to consistently define one’s priors and posteriors. The key for congruence is that the (implied) prior model odds used for testing are the same as those used for estimation. Our recommendation is simple: If one reports a Bayes factor comparing two models, then one should also report posterior estimates which appropriately acknowledge the uncertainty with regards to which of the two models is correct.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122958642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Transformation of Treated-Control Matched-Pair Differences for Graphical Display","authors":"P. Rosenbaum","doi":"10.1080/00031305.2022.2063944","DOIUrl":"https://doi.org/10.1080/00031305.2022.2063944","url":null,"abstract":"Abstract A new transformation is proposed for treated-minus-control matched pair differences that leaves the center of their distribution untouched, but symmetrically and smoothly transforms and shortens the tails. In this way, the center of the distribution is interpretable, undistorted and uncompressed, yet outliers are clear and distinct along the periphery. The transformation of pair differences, ,is strictly increasing, continuous, differentiable and odd, , so its action in the extreme upper tail mirrors its action in the extreme lower tail. Moreover, the center of the distribution—typically 90% or 95% of the distribution—is not transformed, with for , yet the nonlinear transformation of the tails is barely perceptible as it begins at , in the sense that , where is the derivative of . The transformation is applied to an observational study of the effect of light daily alcohol consumption on the level of HDL cholesterol. The study has three control groups intended to address specific unmeasured biases; so, several types of pair differences require coordinated depiction focused on unmeasured bias, not outliers. An R package tailTransform implements the method, contains the data, and reproduces aspects of the graphs and data analysis.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"161 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistics in Medicine","authors":"W. Dai, T. Hamasaki","doi":"10.1080/00031305.2022.2054626","DOIUrl":"https://doi.org/10.1080/00031305.2022.2054626","url":null,"abstract":"In response to the ongoing global public health crisis since the Spring of 2020, there has been an urgent need to study infectious diseases by using massive amounts of collected data. A Bayesian inferential strategy allows us to simultaneously characterize and forecast the spread of infectious disease while quantifying the uncertainties. Bayesian Analysis of Infectious Diseases comes out at a perfect and critical time to introduce the latest Bayesian techniques for the statistical analysis of infectious diseases. Based on the authors’ cumulative expertise, comprehensive explorations of various topics and case studies are generously provided from beginning to end. This book will greatly benefit statisticians, epidemiologists, and especially graduate students who are interested in this popular topic. Chapter 1 provides a high-level overview of infectious diseases and their analyses using multiple reliable resources including books, articles, and websites. Chapter 2 starts with a brief introduction to the history of Bayesian statistics and the basic theory required for performing Bayesian data analysis. Fundamental concepts including data likelihood, prior, posterior, and predictive distributions are clearly explained and illustrated using several common models including Bernoulli, Poisson, Gaussian, and most importantly, the simplest epidemiological susceptible-infectious (SI) model. Three major types of inferences are discussed in great detail including estimation, hypothesis testing, and prediction. I truly appreciate that the authors provide straightforward R code to implement almost every illustrated model throughout the book, not only this chapter. In parallel with the previous chapter, Chapter 3 describes the underlying mechanism of infectious diseases that should be understood before statistical modeling, including how our immune system fights disease, how drugs attack infections, and how vaccines work. The authors make tremendous efforts to improve the reading experience especially for those with limited biological knowledge. I personally really like Table 3.1 on pp. 44–47, which summarizes the important characteristics of wellknown infectious diseases. The chapter also briefly introduces emerging infectious diseases such as the coronavirus. Chapters 4 and 5 focus on Bayesian inference of the discretetime Markov chain with applications in biology. Chapter 4 illustrates concepts of the discrete-time Markov chain, a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Those concepts are the theoretical foundation for Markov chain Monte Carlo techniques, which have significantly advanced Bayesian statistics in the past decades. Chapter 5 further illustrates how to apply Bayesian inference of discrete-time Markov chain to understand the mechanism of various biological phenomena through several classical processes including the stochastic suscep","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115098237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Analysis of Infectious Diseases: COVID-19 and Beyond.","authors":"Qiwei Li","doi":"10.1080/00031305.2022.2054625","DOIUrl":"https://doi.org/10.1080/00031305.2022.2054625","url":null,"abstract":"In response to the ongoing global public health crisis since the Spring of 2020, there has been an urgent need to study infectious diseases by using massive amounts of collected data. A Bayesian inferential strategy allows us to simultaneously characterize and forecast the spread of infectious disease while quantifying the uncertainties. Bayesian Analysis of Infectious Diseases comes out at a perfect and critical time to introduce the latest Bayesian techniques for the statistical analysis of infectious diseases. Based on the authors’ cumulative expertise, comprehensive explorations of various topics and case studies are generously provided from beginning to end. This book will greatly benefit statisticians, epidemiologists, and especially graduate students who are interested in this popular topic. Chapter 1 provides a high-level overview of infectious diseases and their analyses using multiple reliable resources including books, articles, and websites. Chapter 2 starts with a brief introduction to the history of Bayesian statistics and the basic theory required for performing Bayesian data analysis. Fundamental concepts including data likelihood, prior, posterior, and predictive distributions are clearly explained and illustrated using several common models including Bernoulli, Poisson, Gaussian, and most importantly, the simplest epidemiological susceptible-infectious (SI) model. Three major types of inferences are discussed in great detail including estimation, hypothesis testing, and prediction. I truly appreciate that the authors provide straightforward R code to implement almost every illustrated model throughout the book, not only this chapter. In parallel with the previous chapter, Chapter 3 describes the underlying mechanism of infectious diseases that should be understood before statistical modeling, including how our immune system fights disease, how drugs attack infections, and how vaccines work. The authors make tremendous efforts to improve the reading experience especially for those with limited biological knowledge. I personally really like Table 3.1 on pp. 44–47, which summarizes the important characteristics of wellknown infectious diseases. The chapter also briefly introduces emerging infectious diseases such as the coronavirus. Chapters 4 and 5 focus on Bayesian inference of the discretetime Markov chain with applications in biology. Chapter 4 illustrates concepts of the discrete-time Markov chain, a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Those concepts are the theoretical foundation for Markov chain Monte Carlo techniques, which have significantly advanced Bayesian statistics in the past decades. Chapter 5 further illustrates how to apply Bayesian inference of discrete-time Markov chain to understand the mechanism of various biological phenomena through several classical processes including the stochastic suscep","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134016812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"“Two Truths and a Lie” as a Class-Participation Activity","authors":"A. Gelman","doi":"10.1080/00031305.2022.2058612","DOIUrl":"https://doi.org/10.1080/00031305.2022.2058612","url":null,"abstract":"Abstract We adapt the social game “Two truths and a lie” to a classroom setting to give an activity that introduces principles of statistical measurement, uncertainty, prediction, and calibration, while giving students an opportunity to meet each other. We discuss how this activity can be used in a range of different statistics courses.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129682354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Look into the Problem of Preferential Sampling through the Lens of Survey Statistics","authors":"Daniel Vedensky, Paul A. Parker, S. Holan","doi":"10.1080/00031305.2022.2143898","DOIUrl":"https://doi.org/10.1080/00031305.2022.2143898","url":null,"abstract":"Abstract An evolving problem in the field of spatial and ecological statistics is that of preferential sampling, where biases may be present due to a relationship between sample data locations and a response of interest. This field of research bears a striking resemblance to the longstanding problem of informative sampling within survey methodology, although with some important distinctions. With the goal of promoting collaborative effort within and between these two problem domains, we make comparisons and contrasts between the two problem statements. Specifically, we review many of the solutions available to address each of these problems, noting the important differences in modeling techniques. Additionally, we construct a series of simulation studies to examine some of the methods available for preferential sampling, as well as a comparison analyzing heavy metal biomonitoring data.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128037960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}