{"title":"A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models","authors":"Namjoon Suh, Guang Cheng","doi":"10.1146/annurev-statistics-040522-013920","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-013920","url":null,"abstract":"In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"111 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142684813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Models and Rating Systems for Head-to-Head Competition","authors":"Mark E. Glickman, Albyn C. Jones","doi":"10.1146/annurev-statistics-040722-061813","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040722-061813","url":null,"abstract":"One of the most important tasks in sports analytics is the development of binary response models for head-to-head game outcomes to estimate team and player strength. We discuss commonly used probability models for game outcomes, including the Bradley–Terry and Thurstone–Mosteller models, as well as extensions to ties as a third outcome and to the inclusion of a home-field advantage. We consider dynamic extensions to these models to account for the evolution of competitor strengths over time. Full likelihood-based analyses of these time-varying models can be simplified into rating systems, such as the Elo and Glicko rating systems. We present other modern rating systems, including popular methods for online gaming, and novel systems that have been implemented for online chess and Go. The discussion of the analytic methods are accompanied by examples of where these approaches have been implemented for various gaming organizations, as well as a detailed application to National Basketball Association game outcomes.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"1 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yahui Bai, Yuhe Gao, Runzhe Wan, Sheng Zhang, Rui Song
{"title":"A Review of Reinforcement Learning in Financial Applications","authors":"Yahui Bai, Yuhe Gao, Runzhe Wan, Sheng Zhang, Rui Song","doi":"10.1146/annurev-statistics-112723-034423","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034423","url":null,"abstract":"In recent years, there has been a growing trend of applying reinforcement learning (RL) in financial applications. This approach has shown great potential for decision-making tasks in finance. In this review, we present a comprehensive study of the applications of RL in finance and conduct a series of meta-analyses to investigate the common themes in the literature, such as the factors that most significantly affect RL's performance compared with traditional methods. Moreover, we identify challenges, including explainability, Markov decision process modeling, and robustness, that hinder the broader utilization of RL in the financial industry and discuss recent advancements in overcoming these challenges. Finally, we propose future research directions, such as benchmarking, contextual RL, multi-agent RL, and model-based RL to address these challenges and to further enhance the implementation of RL in finance.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"25 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142642981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Modeling of Longitudinal and Survival Data","authors":"Jane-Ling Wang, Qixian Zhong","doi":"10.1146/annurev-statistics-112723-034334","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034334","url":null,"abstract":"In medical studies, time-to-event outcomes such as time to death or relapse of a disease are routinely recorded along with longitudinal data that are observed intermittently during the follow-up period. For various reasons, marginal approaches to model the event time, corresponding to separate approaches for survival data/longitudinal data, tend to induce bias and lose efficiency. Instead, a joint modeling approach that brings the two types of data together can reduce or eliminate the bias and yield a more efficient estimation procedure. A well-established avenue for joint modeling is the joint likelihood approach that often produces semiparametric efficient estimators for the finite-dimensional parameter vectors in both models. Through a transformation survival model with an unspecified baseline hazard function, this review introduces joint modeling that accommodates both baseline covariates and time-varying covariates. The focus is on the major challenges faced by joint modeling and how they can be overcome. A review of available software implementations and a brief discussion of future directions of the field are also included.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"246 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Zammit-Mangion, Matthew Sainsbury-Dale, Raphaël Huser
{"title":"Neural Methods for Amortized Inference","authors":"Andrew Zammit-Mangion, Matthew Sainsbury-Dale, Raphaël Huser","doi":"10.1146/annurev-statistics-112723-034123","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034123","url":null,"abstract":"Simulation-based methods for statistical inference have evolved dramatically over the past 50 years, keeping pace with technological advancements. The field is undergoing a new revolution as it embraces the representational capacity of neural networks, optimization libraries, and graphics processing units for learning complex mappings between data and inferential targets. The resulting tools are amortized, in the sense that, after an initial setup cost, they allow rapid inference through fast feed-forward operations. In this article we review recent progress in the context of point estimation, approximate Bayesian inference, summary-statistic construction, and likelihood approximation. We also cover software and include a simple illustration to showcase the wide array of tools available for amortized inference and the benefits they offer over Markov chain Monte Carlo methods. The article concludes with an overview of relevant topics and an outlook on future research directions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"95 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infectious Disease Modeling","authors":"Jing Huang, Jeffrey S. Morris","doi":"10.1146/annurev-statistics-112723-034351","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034351","url":null,"abstract":"Infectious diseases pose a persistent challenge to public health worldwide. Recent global health crises, such as the COVID-19 pandemic and Ebola outbreaks, have underscored the vital role of infectious disease modeling in guiding public health policy and response. Infectious disease modeling is a critical tool for society, informing risk mitigation measures, prompting timely interventions, and aiding preparedness for healthcare delivery systems. This article synthesizes the current landscape of infectious disease modeling, emphasizing the integration of statistical methods in understanding and predicting the spread of infectious diseases. We begin by examining the historical context and the foundational models that have shaped the field, such as the SIR (susceptible, infectious, recovered) and SEIR (susceptible, exposed, infectious, recovered) models. Subsequently, we delve into the methodological innovations that have arisen, including stochastic modeling, network-based approaches, and the use of big data analytics. We also explore the integration of machine learning techniques in enhancing model accuracy and responsiveness. The review identifies the challenges of parameter estimation, model validation, and the incorporation of real-time data streams. Moreover, we discuss the ethical implications of modeling, such as privacy concerns and the communication of risk. The article concludes by discussing future directions for research, highlighting the need for data integration and interdisciplinary collaboration for advancing infectious disease modeling.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"15 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensors in High-Dimensional Data Analysis: Methodological Opportunities and Theoretical Challenges","authors":"Arnab Auddy, Dong Xia, Ming Yuan","doi":"10.1146/annurev-statistics-112723-034548","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034548","url":null,"abstract":"Large amounts of multidimensional data represented by multiway arrays or tensors are prevalent in modern applications across various fields such as chemometrics, genomics, physics, psychology, and signal processing. The structural complexity of such data provides vast new opportunities for modeling and analysis, but efficiently extracting information content from them, both statistically and computationally, presents unique and fundamental challenges. Addressing these challenges requires an interdisciplinary approach that brings together tools and insights from statistics, optimization, and numerical linear algebra, among other fields. Despite these hurdles, significant progress has been made in the past decade. This review seeks to examine some of the key advancements and identify common threads among them, under a number of different statistical settings.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"40 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Excess Mortality Estimation","authors":"Jon Wakefield, Victoria Knutson","doi":"10.1146/annurev-statistics-112723-034236","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034236","url":null,"abstract":"Estimating the mortality associated with a specific mortality crisis event (for example, a pandemic, natural disaster, or conflict) is clearly an important public health undertaking. In many situations, deaths may be directly or indirectly attributable to the mortality crisis event, and both contributions may be of interest. The totality of the mortality impact on the population (direct and indirect deaths) includes the knock-on effects of the event, such as a breakdown of the health care system, or increased mortality due to shortages of resources. Unfortunately, estimating the deaths directly attributable to the event is frequently problematic. Hence, the excess mortality, defined as the difference between the observed mortality and that which would have occurred in the absence of the crisis event, is an estimation target. If the region of interest contains a functioning vital registration system, so that the mortality is fully observed and reliable, then the only modeling required is to produce the expected deaths counts, but this is a nontrivial exercise. In low- and middle-income countries it is common for there to be incomplete (or nonexistent) mortality data, and one must then use additional data and/or modeling, including predicting mortality using auxiliary variables. We describe and review each of these aspects, give examples of excess mortality studies, and provide a case study on excess mortality across states of the United States during the COVID-19 pandemic.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"6 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empirical Likelihood in Functional Data Analysis","authors":"Hsin-wen Chang, Ian W. McKeague","doi":"10.1146/annurev-statistics-112723-034225","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034225","url":null,"abstract":"Functional data analysis (FDA) studies data that include infinite-dimensional functions or objects, generalizing traditional univariate or multivariate observations from each study unit. Among inferential approaches without parametric assumptions, empirical likelihood (EL) offers a principled method in that it extends the framework of parametric likelihood ratio–based inference via the nonparametric likelihood. There has been increasing use of EL in FDA due to its many favorable properties, including self-normalization and the data-driven shape of confidence regions. This article presents a review of EL approaches in FDA, starting with finite-dimensional features, then covering infinite-dimensional features. We contrast smooth and nonsmooth frameworks in FDA and show how EL has been incorporated into both of them. The article concludes with a discussion of some future research directions, including the possibility of applying EL to conformal inference.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"60 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142601275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoyu Yang, Zhonghua Liu, Ruoyu Wang, En-Yu Lai, Joel Schwartz, Andrea A. Baccarelli, Yen-Tsung Huang, Xihong Lin
{"title":"Causal Mediation Analysis for Integrating Exposure, Genomic, and Phenotype Data","authors":"Haoyu Yang, Zhonghua Liu, Ruoyu Wang, En-Yu Lai, Joel Schwartz, Andrea A. Baccarelli, Yen-Tsung Huang, Xihong Lin","doi":"10.1146/annurev-statistics-040622-031653","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040622-031653","url":null,"abstract":"Causal mediation analysis provides an attractive framework for integrating diverse types of exposure, genomic, and phenotype data. Recently, this field has seen a surge of interest, largely driven by the increasing need for causal mediation analyses in health and social sciences. This article aims to provide a review of recent developments in mediation analysis, encompassing mediation analysis of a single mediator and a large number of mediators, as well as mediation analysis with multiple exposures and mediators. Our review focuses on the recent advancements in statistical inference for causal mediation analysis, especially in the context of high-dimensional mediation analysis. We delve into the complexities of testing mediation effects, especially addressing the challenge of testing a large number of composite null hypotheses. Through extensive simulation studies, we compare the existing methods across a range of scenarios. We also include an analysis of data from the Normative Aging Study, which examines DNA methylation CpG sites as potential mediators of the effect of smoking status on lung function. We discuss the pros and cons of these methods and future research directions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"26 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142555732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}