{"title":"Integrative Analysis of Multimodal Omics Data","authors":"Gen Li, Eric F. Lock","doi":"10.1146/annurev-statistics-042424-113016","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-113016","url":null,"abstract":"With advancements in technology and the decreasing cost of data acquisition, high-throughput omics data have become increasingly prevalent in biomedical research. These data are often collected across multiple omics modalities at different molecular levels, offering a comprehensive perspective on underlying biological mechanisms. However, the multimodal nature of multiomics data presents unique and complex challenges for statistical analysis. In this article, we provide a comprehensive review of recent advancements in statistical methods for multiomics data integration. We discuss key topics in unsupervised learning (including dimension reduction, clustering, and network analysis), supervised learning (including regression, classification, and mediation analysis), and other areas. Finally, we highlight unresolved challenges and propose promising directions for future research to further advance the field.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"114 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Change-Point Detection and Its Modern Applications","authors":"Jialiang Li, Jingli Wang, Yuetao Yu","doi":"10.1146/annurev-statistics-041124-044143","DOIUrl":"https://doi.org/10.1146/annurev-statistics-041124-044143","url":null,"abstract":"We review recent advances in change-point detection methods across three important fields of statistics: (<jats:italic>a</jats:italic>) We first present a subgroup identification method based on a multi-threshold change plane model where the subgroup boundaries are defined by a high-dimensional hyperplane in the covariate space. Subjects grouped into different regions may receive more individualized treatments in medical research studies and achieve improved health outcomes. (<jats:italic>b</jats:italic>) We then consider the estimation of discontinuity for functional process data. Many longitudinal or functional responses may exhibit abrupt jumps, and our methodology effectively accommodates such complicated nonsmooth features. (<jats:italic>c</jats:italic>) Finally, we explore change-point estimation within dynamic networks using a recently proposed network autoregressive model. This framework demonstrates that community structures in networks can shift similarly to changes observed in time series data. These reviews highlight the wide-ranging applications of change-point detection methodologies in modern data analysis.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"43 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Disasters, Statistics, and the Humanitarian Sector","authors":"Hamish William Patten, Zineb Bhaby","doi":"10.1146/annurev-statistics-042424-061122","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-061122","url":null,"abstract":"This article examines the role of statistics in the humanitarian sector, with a particular focus on disasters caused by natural hazards. It begins by outlining current applications, including primary data collection, anticipatory action frameworks, Earth observation, mobile positioning data, and artificial intelligence. It then highlights key challenges such as gaps and biases in disaster impact and response data, difficulties in communicating statistical findings clearly, inequities in aid allocation, and the widespread outsourcing of statistics-related work. In exploring future applications, the article discusses the potential of impact-based early warning models, dynamic population data, and artificial intelligence to enhance communication and decision-making. Throughout, emphasis is placed on the need for interoperable systems as well as ethical and inclusive data practices. In doing so, the article presents statistics as both a diagnostic and strategic tool for strengthening the effectiveness, fairness, and responsiveness of humanitarian action in disaster contexts.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"85 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145043391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Van Calster, Maarten van Smeden, Wouter van Amsterdam, Maarten Coemans, Laure Wynants, Ewout W. Steyerberg
{"title":"The Enemies of Reliable and Useful Clinical Prediction Models: A Review of Statistical and Scientific Challenges","authors":"Ben Van Calster, Maarten van Smeden, Wouter van Amsterdam, Maarten Coemans, Laure Wynants, Ewout W. Steyerberg","doi":"10.1146/annurev-statistics-042324-123749","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042324-123749","url":null,"abstract":"The current status of applied clinical prediction modeling is poor. Many models are developed with suboptimal methods and are not evaluated, and hence have little impact on clinical care. We review 12 challenges—provocatively labeled enemies—that jeopardize the creation of prediction models that make it to clinical practice to improve treatment decisions and clinical outcomes for individual patients. The challenges cover four areas: context, data, design and analysis, and scientific culture. We provide negative examples and recommendations for improvement, but also highlight positive examples and developments. Greater awareness of the complexities surrounding clinical prediction modeling is needed among researchers, funding agencies, health professionals as end users, and all of us as potential patients. To improve the utility of prediction models for healthcare and society, we need fewer but better models as well as more resources for model validation, impact assessment, and implementation.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"20 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model-Based Spatial Data Fusion","authors":"Alan E. Gelfand, Erin M. Schliep","doi":"10.1146/annurev-statistics-042424-052920","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-052920","url":null,"abstract":"With increased data collection, the need to fuse data sources has emerged as an important and rapidly growing research activity in the statistical community. In considering spatial and spatio-temporal datasets to examine complex environmental and ecological processes of interest, we often have multiple sources that are jointly informative about features of interest of the processes. Model-based data fusion aims to leverage information from these sources to improve inference and prediction. In the spatial statistics setting, these data could be geostatistical; areal; or point patterns with varying spatial resolutions, supports, and domains. Given two or more sources, we explore stochastic modeling to implement a suitable fusion with full inference and uncertainty quantification. We illustrate these ideas using three environmental and ecological examples: precipitation, marine mammal abundance, and joint species distributions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"10 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demystifying Inference After Adaptive Experiments","authors":"Aurélien Bibaut, Nathan Kallus","doi":"10.1146/annurev-statistics-040522-015431","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-015431","url":null,"abstract":"Adaptive experiments such as multi-armed bandits adapt the treatment-allocation policy and/or the decision to stop the experiment to the data observed so far. This has the potential to improve outcomes for study participants within the experiment, to improve the chance of identifying the best treatments after the experiment, and to avoid wasting data. As an experiment (and not just a continually optimizing system), it is still desirable to draw statistical inferences with frequentist guarantees. The concentration inequalities and union bounds that generally underlie adaptive experimentation algorithms can yield overly conservative inferences, but at the same time, the asymptotic normality we would usually appeal to in nonadaptive settings can be imperiled by adaptivity. In this article we aim to explain why, how, and when adaptivity is in fact an issue for inference and, when it is, to understand the various ways to fix it: reweighting to stabilize variances and recover asymptotic normality, using always-valid inference based on joint normality of an asymptotic limiting sequence, and characterizing and inverting the nonnormal distributions induced by adaptivity.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"26 1","pages":"407-423"},"PeriodicalIF":7.9,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models","authors":"Namjoon Suh, Guang Cheng","doi":"10.1146/annurev-statistics-040522-013920","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-013920","url":null,"abstract":"In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"111 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142684813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Models and Rating Systems for Head-to-Head Competition","authors":"Mark E. Glickman, Albyn C. Jones","doi":"10.1146/annurev-statistics-040722-061813","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040722-061813","url":null,"abstract":"One of the most important tasks in sports analytics is the development of binary response models for head-to-head game outcomes to estimate team and player strength. We discuss commonly used probability models for game outcomes, including the Bradley–Terry and Thurstone–Mosteller models, as well as extensions to ties as a third outcome and to the inclusion of a home-field advantage. We consider dynamic extensions to these models to account for the evolution of competitor strengths over time. Full likelihood-based analyses of these time-varying models can be simplified into rating systems, such as the Elo and Glicko rating systems. We present other modern rating systems, including popular methods for online gaming, and novel systems that have been implemented for online chess and Go. The discussion of the analytic methods are accompanied by examples of where these approaches have been implemented for various gaming organizations, as well as a detailed application to National Basketball Association game outcomes.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"1 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142679222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yahui Bai, Yuhe Gao, Runzhe Wan, Sheng Zhang, Rui Song
{"title":"A Review of Reinforcement Learning in Financial Applications","authors":"Yahui Bai, Yuhe Gao, Runzhe Wan, Sheng Zhang, Rui Song","doi":"10.1146/annurev-statistics-112723-034423","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034423","url":null,"abstract":"In recent years, there has been a growing trend of applying reinforcement learning (RL) in financial applications. This approach has shown great potential for decision-making tasks in finance. In this review, we present a comprehensive study of the applications of RL in finance and conduct a series of meta-analyses to investigate the common themes in the literature, such as the factors that most significantly affect RL's performance compared with traditional methods. Moreover, we identify challenges, including explainability, Markov decision process modeling, and robustness, that hinder the broader utilization of RL in the financial industry and discuss recent advancements in overcoming these challenges. Finally, we propose future research directions, such as benchmarking, contextual RL, multi-agent RL, and model-based RL to address these challenges and to further enhance the implementation of RL in finance.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"25 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142642981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Modeling of Longitudinal and Survival Data","authors":"Jane-Ling Wang, Qixian Zhong","doi":"10.1146/annurev-statistics-112723-034334","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034334","url":null,"abstract":"In medical studies, time-to-event outcomes such as time to death or relapse of a disease are routinely recorded along with longitudinal data that are observed intermittently during the follow-up period. For various reasons, marginal approaches to model the event time, corresponding to separate approaches for survival data/longitudinal data, tend to induce bias and lose efficiency. Instead, a joint modeling approach that brings the two types of data together can reduce or eliminate the bias and yield a more efficient estimation procedure. A well-established avenue for joint modeling is the joint likelihood approach that often produces semiparametric efficient estimators for the finite-dimensional parameter vectors in both models. Through a transformation survival model with an unspecified baseline hazard function, this review introduces joint modeling that accommodates both baseline covariates and time-varying covariates. The focus is on the major challenges faced by joint modeling and how they can be overcome. A review of available software implementations and a brief discussion of future directions of the field are also included.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"246 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}