Andrew Zammit-Mangion, Matthew Sainsbury-Dale, Raphaël Huser
{"title":"Neural Methods for Amortized Inference","authors":"Andrew Zammit-Mangion, Matthew Sainsbury-Dale, Raphaël Huser","doi":"10.1146/annurev-statistics-112723-034123","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034123","url":null,"abstract":"Simulation-based methods for statistical inference have evolved dramatically over the past 50 years, keeping pace with technological advancements. The field is undergoing a new revolution as it embraces the representational capacity of neural networks, optimization libraries, and graphics processing units for learning complex mappings between data and inferential targets. The resulting tools are amortized, in the sense that, after an initial setup cost, they allow rapid inference through fast feed-forward operations. In this article we review recent progress in the context of point estimation, approximate Bayesian inference, summary-statistic construction, and likelihood approximation. We also cover software and include a simple illustration to showcase the wide array of tools available for amortized inference and the benefits they offer over Markov chain Monte Carlo methods. The article concludes with an overview of relevant topics and an outlook on future research directions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"95 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infectious Disease Modeling","authors":"Jing Huang, Jeffrey S. Morris","doi":"10.1146/annurev-statistics-112723-034351","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034351","url":null,"abstract":"Infectious diseases pose a persistent challenge to public health worldwide. Recent global health crises, such as the COVID-19 pandemic and Ebola outbreaks, have underscored the vital role of infectious disease modeling in guiding public health policy and response. Infectious disease modeling is a critical tool for society, informing risk mitigation measures, prompting timely interventions, and aiding preparedness for healthcare delivery systems. This article synthesizes the current landscape of infectious disease modeling, emphasizing the integration of statistical methods in understanding and predicting the spread of infectious diseases. We begin by examining the historical context and the foundational models that have shaped the field, such as the SIR (susceptible, infectious, recovered) and SEIR (susceptible, exposed, infectious, recovered) models. Subsequently, we delve into the methodological innovations that have arisen, including stochastic modeling, network-based approaches, and the use of big data analytics. We also explore the integration of machine learning techniques in enhancing model accuracy and responsiveness. The review identifies the challenges of parameter estimation, model validation, and the incorporation of real-time data streams. Moreover, we discuss the ethical implications of modeling, such as privacy concerns and the communication of risk. The article concludes by discussing future directions for research, highlighting the need for data integration and interdisciplinary collaboration for advancing infectious disease modeling.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"15 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensors in High-Dimensional Data Analysis: Methodological Opportunities and Theoretical Challenges","authors":"Arnab Auddy, Dong Xia, Ming Yuan","doi":"10.1146/annurev-statistics-112723-034548","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034548","url":null,"abstract":"Large amounts of multidimensional data represented by multiway arrays or tensors are prevalent in modern applications across various fields such as chemometrics, genomics, physics, psychology, and signal processing. The structural complexity of such data provides vast new opportunities for modeling and analysis, but efficiently extracting information content from them, both statistically and computationally, presents unique and fundamental challenges. Addressing these challenges requires an interdisciplinary approach that brings together tools and insights from statistics, optimization, and numerical linear algebra, among other fields. Despite these hurdles, significant progress has been made in the past decade. This review seeks to examine some of the key advancements and identify common threads among them, under a number of different statistical settings.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"40 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Excess Mortality Estimation","authors":"Jon Wakefield, Victoria Knutson","doi":"10.1146/annurev-statistics-112723-034236","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034236","url":null,"abstract":"Estimating the mortality associated with a specific mortality crisis event (for example, a pandemic, natural disaster, or conflict) is clearly an important public health undertaking. In many situations, deaths may be directly or indirectly attributable to the mortality crisis event, and both contributions may be of interest. The totality of the mortality impact on the population (direct and indirect deaths) includes the knock-on effects of the event, such as a breakdown of the health care system, or increased mortality due to shortages of resources. Unfortunately, estimating the deaths directly attributable to the event is frequently problematic. Hence, the excess mortality, defined as the difference between the observed mortality and that which would have occurred in the absence of the crisis event, is an estimation target. If the region of interest contains a functioning vital registration system, so that the mortality is fully observed and reliable, then the only modeling required is to produce the expected deaths counts, but this is a nontrivial exercise. In low- and middle-income countries it is common for there to be incomplete (or nonexistent) mortality data, and one must then use additional data and/or modeling, including predicting mortality using auxiliary variables. We describe and review each of these aspects, give examples of excess mortality studies, and provide a case study on excess mortality across states of the United States during the COVID-19 pandemic.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"6 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empirical Likelihood in Functional Data Analysis","authors":"Hsin-wen Chang, Ian W. McKeague","doi":"10.1146/annurev-statistics-112723-034225","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034225","url":null,"abstract":"Functional data analysis (FDA) studies data that include infinite-dimensional functions or objects, generalizing traditional univariate or multivariate observations from each study unit. Among inferential approaches without parametric assumptions, empirical likelihood (EL) offers a principled method in that it extends the framework of parametric likelihood ratio–based inference via the nonparametric likelihood. There has been increasing use of EL in FDA due to its many favorable properties, including self-normalization and the data-driven shape of confidence regions. This article presents a review of EL approaches in FDA, starting with finite-dimensional features, then covering infinite-dimensional features. We contrast smooth and nonsmooth frameworks in FDA and show how EL has been incorporated into both of them. The article concludes with a discussion of some future research directions, including the possibility of applying EL to conformal inference.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"60 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoyu Yang, Zhonghua Liu, Ruoyu Wang, En-Yu Lai, Joel Schwartz, Andrea A. Baccarelli, Yen-Tsung Huang, Xihong Lin
{"title":"Causal Mediation Analysis for Integrating Exposure, Genomic, and Phenotype Data","authors":"Haoyu Yang, Zhonghua Liu, Ruoyu Wang, En-Yu Lai, Joel Schwartz, Andrea A. Baccarelli, Yen-Tsung Huang, Xihong Lin","doi":"10.1146/annurev-statistics-040622-031653","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040622-031653","url":null,"abstract":"Causal mediation analysis provides an attractive framework for integrating diverse types of exposure, genomic, and phenotype data. Recently, this field has seen a surge of interest, largely driven by the increasing need for causal mediation analyses in health and social sciences. This article aims to provide a review of recent developments in mediation analysis, encompassing mediation analysis of a single mediator and a large number of mediators, as well as mediation analysis with multiple exposures and mediators. Our review focuses on the recent advancements in statistical inference for causal mediation analysis, especially in the context of high-dimensional mediation analysis. We delve into the complexities of testing mediation effects, especially addressing the challenge of testing a large number of composite null hypotheses. Through extensive simulation studies, we compare the existing methods across a range of scenarios. We also include an analysis of data from the Normative Aging Study, which examines DNA methylation CpG sites as potential mediators of the effect of smoking status on lung function. We discuss the pros and cons of these methods and future research directions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"26 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142555732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designs for Vaccine Studies","authors":"M. Elizabeth Halloran","doi":"10.1146/annurev-statistics-033121-120121","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033121-120121","url":null,"abstract":"Due to dependent happenings, vaccines can have different effects in populations. In addition to direct protective effects in the vaccinated, vaccination in a population can have indirect effects in the unvaccinated individuals. Vaccination can also reduce person-to-person transmission to vaccinated individuals or from vaccinated individuals compared with unvaccinated individuals. Design of vaccine studies has a history extending back over a century. Emerging infectious diseases, such as the SARS-CoV-2 pandemic and the Ebola outbreak in West Africa, have stimulated new interest in vaccine studies. We focus on some recent developments, such as target trial emulation, test-negative design, and regression discontinuity design. Methods for evaluating durability of vaccine effects were developed in the context of both blinded and unblinded placebo crossover studies. The case-ascertained design is used to assess the transmission effects of vaccines. The novel ring vaccination trial design was first used in the Ebola outbreak in West Africa.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"5 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142555733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation, and Blackwell's Theorem","authors":"Weijie J. Su","doi":"10.1146/annurev-statistics-112723-034158","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034158","url":null,"abstract":"Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Although differential privacy originated in the cryptographic context, in this review we argue that, fundamentally, it can be considered a pure statistical concept. We leverage Blackwell's informativeness theorem and focus on demonstrating that the definition of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of f-differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render f-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and US Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared with existing alternatives.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"55 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142449536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reproducibility in the Classroom","authors":"Mine Dogucu","doi":"10.1146/annurev-statistics-112723-034436","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034436","url":null,"abstract":"Difficulties in reproducing results from scientific studies have lately been referred to as a reproducibility crisis. Scientific practice depends heavily on scientific training. What gets taught in the classroom is often practiced in labs, fields, and data analysis. The importance of reproducibility in the classroom has gained momentum in statistics education in recent years. In this article, we review the existing literature on reproducibility education. We delve into the relationship between computing tools and reproducibility through visiting historical developments in this area. We share examples for teaching reproducibility and reproducible teaching while discussing the pedagogical opportunities created by these examples as well as challenges that the instructors should be aware of. We detail the use of teaching reproducibility and reproducible teaching practices in an introductory data science course. Lastly, we provide recommendations on reproducibility education for instructors, administrators, and other members of the scientific community.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"2 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142398006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Additive Models","authors":"Simon N. Wood","doi":"10.1146/annurev-statistics-112723-034249","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034249","url":null,"abstract":"Generalized additive models are generalized linear models in which the linear predictor includes a sum of smooth functions of covariates, where the shape of the functions is to be estimated. They have also been generalized beyond the original generalized linear model setting to distributions outside the exponential family and to situations in which multiple parameters of the response distribution may depend on sums of smooth functions of covariates. The widely used computational and inferential framework in which the smooth terms are represented as latent Gaussian processes, splines, or Gaussian random effects is reviewed, paying particular attention to the case in which computational and theoretical tractability is obtained by prior rank reduction of the model terms. An empirical Bayes approach is taken, and its relatively good frequentist performance discussed, along with some more overtly frequentist approaches to model selection. Estimation of the degree of smoothness of component functions via cross validation or marginal likelihood is covered, alongside the computational strategies required in practice, including when data and models are reasonably large. It is briefly shown how the framework extends easily to location-scale modeling, and, with more effort, to techniques such as quantile regression. Also covered are the main classes of smooths of multiple covariates that may be included in models: isotropic splines and tensor product smooth interaction terms.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"39 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142384162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}