Yiwei Tang, Junyu Chen, Deyuan Li, Huixia Judy Wang
{"title":"Recent Advances in Conditional Extreme Quantile Analysis","authors":"Yiwei Tang, Junyu Chen, Deyuan Li, Huixia Judy Wang","doi":"10.1146/annurev-statistics-042324-014139","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042324-014139","url":null,"abstract":"Estimating conditional extreme quantiles is essential for assessing tail risks in complex systems, with broad applications in finance, climate science, engineering, and beyond. While classical extreme value theory provides a foundational framework, recent advances, particularly semiparametric and nonparametric methods, including approaches based on quantile regression, machine learning, and deep learning, have greatly enriched the methodological landscape. This modern review synthesizes these developments, covering traditional likelihood-based methods, semiparametric approaches, and tree-based and deep learning techniques, including higher-order refinements.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"45 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erik Vanem, Sheng Dong, Guillaume de Hauteclocque, Thomas Berge Johannessen, Tingyao Zhu, Jasna Prpic-Orsic, Sanne van Essen, Kevin Ewans, Ed Mackay, Philip Jonathan
{"title":"Statistical Modeling of the Ocean Environment","authors":"Erik Vanem, Sheng Dong, Guillaume de Hauteclocque, Thomas Berge Johannessen, Tingyao Zhu, Jasna Prpic-Orsic, Sanne van Essen, Kevin Ewans, Ed Mackay, Philip Jonathan","doi":"10.1146/annurev-statistics-042424-115755","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-115755","url":null,"abstract":"Statistical modeling of the ocean environment is important for many practical applications in science and engineering. Probabilistic descriptions of the ocean environment are important input for structural design and risk assessment of marine structures, including ships, offshore and coastal structures, and aquaculture installations. They are also essential for the safe operation of ships and other structures at sea. Additionally, they are critical for planning and decision-making in the exploitation of marine renewable energy sources such as waves, tides, and offshore wind. This article presents a review of recent developments with regard to statistical modeling of the ocean environment, with a particular focus on ocean waves. Such developments are driven by an increasing volume of available data, increasing computational capabilities, and demand from the industry for more accurate and uncertainty-aware descriptions of relevant oceanic variables. Hence, statistical modeling of the ocean environment remains an active area of research, with significant developments in various directions. These are reviewed in this article.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"60 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Theory of Random Forests","authors":"Erwan Scornet, Giles Hooker","doi":"10.1146/annurev-statistics-112723-034707","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034707","url":null,"abstract":"Random forests (RFs) have a long history; they were originally defined by Leo but have antecedents in bagging methods introduced in 1996. They have become one of the most widely adopted machine learning tools thanks to their computational efficiency, relative insensitivity to tuning parameters, inbuilt cross validation, and interpretation tools. Despite their popularity, mathematical theory about the fundamental properties of RFs has been slow to emerge. Nonetheless, the past decade has seen significant advances in our understanding and analysis of these algorithms. In this review article, we describe several variations of RFs and how rates of consistency of these variants highlight the impact of different RF mechanisms on their performance. Another line of research focuses on establishing central limit theorems and confidence intervals for RFs. We also depict recent analyses in variable importance computed with RFs.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"54 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Spatiotemporal Point Processes: Advances and New Directions","authors":"Xiuyuan Cheng, Zheng Dong, Yao Xie","doi":"10.1146/annurev-statistics-042324-040052","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042324-040052","url":null,"abstract":"Spatiotemporal point processes model discrete events distributed in space and time, with applications in criminology, seismology, epidemiology, and social networks. Classical models rely on parametric kernels, limiting their ability to capture heterogeneous, nonstationary dynamics. Recent advances integrate deep neural architectures, either by modeling the conditional intensity directly or by learning flexible, data-driven influence kernels. This article reviews the deep influence kernel approach, which balances statistical interpretability by retaining explicit kernels to capture event propagation, with expressive power from neural architectures. We outline key components, including functional basis decomposition, graph neural networks for encoding spatial or network structures, and both likelihood-based and likelihood-free estimation methods, while addressing scalability for large data. We also highlight theoretical results on kernel identifiability. Applications in crime analysis, earthquake aftershock prediction, and sepsis modeling demonstrate the framework's effectiveness. We conclude with promising directions for developing explainable and scalable deep kernel point processes.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"84 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145908310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eric C. Chi, Aaron J. Molstad, Zheming Gao, Jocelyn T. Chi
{"title":"The Why and How of Convex Clustering","authors":"Eric C. Chi, Aaron J. Molstad, Zheming Gao, Jocelyn T. Chi","doi":"10.1146/annurev-statistics-112723-034107","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034107","url":null,"abstract":"This article reviews a clustering method based on solving a convex optimization problem. Despite the plethora of existing clustering methods, convex clustering has several uncommon features that distinguish it from its predecessors. The optimization problem is free of spurious local minima, and its unique global minimizer is stable with respect to all its inputs, including the data, a tuning parameter, and weight hyperparameters. Its single tuning parameter controls the number of clusters and can be chosen using standard techniques from penalized regression. We give intuition into the behavior of and theory for convex clustering, as well as practical guidance. We highlight important algorithms and discuss how their computational costs scale with the problem size. Finally, we highlight the breadth of its uses and flexibility to be combined and integrated with other inferential methods.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"10 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145658281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Aspects of Racial and Ethnic Health Disparities","authors":"Jay S. Kaufman","doi":"10.1146/annurev-statistics-042324-061403","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042324-061403","url":null,"abstract":"Measurement and analysis of racial and ethnic health disparities are vital functions of government and academia in diverse societies, but the statistical methods for accomplishing this work are underdeveloped. Issues of measurement, aggregation, adjustment, choice of scale, internal validity, and generalizability are all paramount. Measurement of race and ethnicity is complicated by the fact that, as identities that form through historical and political processes, they are not stable over time and place, nor are they objectively verifiable. Similarly, it is impossible to specify an optimal adjustment set, because adjustments are functions of ethical judgments, not statistical criteria. Additional complications arise when decomposing disparities in relation to measured pathways, as well as in the modeling of multiple intersectional strata. The ethical considerations in model selection imply that measurement and modeling of health disparities can never be a purely statistical activity, but instead must be conducted in relation to a theory of justice.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"118 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145609859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Operator Learning: A Statistical Perspective","authors":"Unique Subedi, Ambuj Tewari","doi":"10.1146/annurev-statistics-042424-070908","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-070908","url":null,"abstract":"Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experimental data, even without a known mathematical model. In this article, we begin by formalizing operator learning as a function-to-function regression problem and review some recent developments in the field. We also discuss PDE-specific operator learning, outlining strategies for incorporating physical and mathematical constraints into architecture design and training processes. Finally, we end by highlighting key future directions such as active data collection and the development of rigorous uncertainty quantification frameworks.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"29 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145567648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deependra K. Thapa, Erik S. Parker, Mounika Kandukuri, Xi (Rita) Wang, Thirupathi R. Mokalla, Olivia C. Robertson, Wasiuddin Najam, Andrew E. Teschendorff, Andrew W. Brown, John R. Speakman, Yisheng Peng, Bernard S. Gorman, Heping Zhang, Luis-Enrique Becerra-Garcia, Colby J. Vorland, David B. Allison
{"title":"Statistical Methods in Aging Research: Improving Current Practices and Embracing Emerging Approaches","authors":"Deependra K. Thapa, Erik S. Parker, Mounika Kandukuri, Xi (Rita) Wang, Thirupathi R. Mokalla, Olivia C. Robertson, Wasiuddin Najam, Andrew E. Teschendorff, Andrew W. Brown, John R. Speakman, Yisheng Peng, Bernard S. Gorman, Heping Zhang, Luis-Enrique Becerra-Garcia, Colby J. Vorland, David B. Allison","doi":"10.1146/annurev-statistics-042324-060005","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042324-060005","url":null,"abstract":"Aging research relies on varied statistical methods, and applying these methods appropriately is important for scientific rigor. However, proper use of these statistical techniques is a challenge. We discuss two categories of statistical methods in aging research: ( <jats:italic>a</jats:italic> ) emerging methods requiring further validation, including techniques to examine compression of morbidity, maximum lifespan, immortal time bias, molecular aging clocks, and treatment response heterogeneity, and ( <jats:italic>b</jats:italic> ) classic and existing methods needing reconsideration and improvement, such as stepwise regression, generalized linear models, methods for accounting for clustering and nesting effects, methods for testing for group differences, methods for mediation and moderation analyses, and nonlinear models. For each method, we review its relevance to aging research, highlight statistical issues, and suggest improvements or alternatives with examples from aging research. We urge researchers to refine traditional approaches and embrace emerging methods tailored to the unique challenges of aging research. This review will help researchers identify and apply sound statistical methods, thereby improving statistical rigor in aging research.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"101 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145545454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiguo Cao, Sidi Wu, Muye Nanshan, Haolun Shi, Liangliang Wang
{"title":"Statistical Learning for Functional Data","authors":"Jiguo Cao, Sidi Wu, Muye Nanshan, Haolun Shi, Liangliang Wang","doi":"10.1146/annurev-statistics-042424-052503","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-052503","url":null,"abstract":"Functional data analysis (FDA) is a rapidly growing field in modern statistics that provides powerful tools for analyzing data observed as curves, surfaces, or more general functions. Unlike traditional multivariate methods, FDA explicitly accounts for the smooth and continuous nature of functional data, enabling more accurate modeling and interpretation. Traditional FDA methods, such as functional principal component analysis, functional regression, and functional classification, rely on linear assumptions and basis function expansions, which can limit their effectiveness when applied to nonlinear, high-dimensional, or irregularly sampled data. Recent advances in neural networks provide promising alternatives to these traditional approaches. Deep learning methods offer several key advantages: They naturally capture nonlinear relationships, scale to high-dimensional data without explicit dimension reduction, learn task-specific representations directly from raw observations, and handle sparse or irregular sampling without requiring imputation. This article reviews recent methodological developments in FDA, with a focus on the integration of deep learning techniques. Through this comparative review, we highlight the strengths and limitations of classical and modern approaches, providing practical guidance and future directions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"155 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145536105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proper Scoring Rules for Estimation and Forecast Evaluation","authors":"Kartik Waghmare, Johanna Ziegel","doi":"10.1146/annurev-statistics-042424-050626","DOIUrl":"https://doi.org/10.1146/annurev-statistics-042424-050626","url":null,"abstract":"Proper scoring rules have been a subject of growing interest in recent years, not only as tools for evaluation of probabilistic forecasts but also as methods for estimating probability distributions. In this article, we review the mathematical foundations of proper scoring rules, including general characterization results and important families of scoring rules. We discuss their role in statistics and machine learning for estimation and forecast evaluation. Furthermore, we comment on interesting developments of their usage in applications.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"26 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145509527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}