{"title":"Second-Generation Functional Data","authors":"Salil Koner, Ana-Maria Staicu","doi":"10.1146/annurev-statistics-032921-033726","DOIUrl":"https://doi.org/10.1146/annurev-statistics-032921-033726","url":null,"abstract":"Modern studies from a variety of fields record multiple functional observations according to either multivariate, longitudinal, spatial, or time series designs. We refer to such data as second-generation functional data because their analysis—unlike typical functional data analysis, which assumes independence of the functions—accounts for the complex dependence between the functional observations and requires more advanced methods. In this article, we provide an overview of the techniques for analyzing second-generation functional data with a focus on highlighting the key methodological intricacies that stem from the need for modeling complex dependence, compared with independent functional data. For each of the four types of second-generation functional data presented—multivariate functional data, longitudinal functional data, functional time series and spatially functional data—we discuss how the widely popular functional principal component analysis can be extended to these settings to define, identify main directions of variation, and describe dependence among the functions. In addition to modeling, we also discuss prediction, statistical inference, and application to clustering. We close by discussing future directions in this area.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43209001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Brief Tour of Deep Learning from a Statistical Perspective","authors":"Eric T. Nalisnick, Padhraic Smyth, Dustin Tran","doi":"10.1146/annurev-statistics-032921-013738","DOIUrl":"https://doi.org/10.1146/annurev-statistics-032921-013738","url":null,"abstract":"We expose the statistical foundations of deep learning with the goal of facilitating conversation between the deep learning and statistics communities. We highlight core themes at the intersection; summarize key neural models, such as feedforward neural networks, sequential neural networks, and neural latent variable models; and link these ideas to their roots in probability and statistics. We also highlight research directions in deep learning where there are opportunities for statistical contributions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45067520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Surrogate Endpoints in Clinical Trials","authors":"M. Elliott","doi":"10.1146/annurev-statistics-032921-035359","DOIUrl":"https://doi.org/10.1146/annurev-statistics-032921-035359","url":null,"abstract":"Surrogate markers are often used in clinical trials settings when obtaining a final outcome to evaluate the effectiveness of a treatment requires a long wait, is expensive to obtain, or both. Formal definitions of surrogate marker quality resulting from a large variety of estimation approaches have been proposed over the years. I review this work, with a particular focus on approaches that use the causal inference paradigm, as these conceptualize a good marker as one in the causal pathway between the treatment and outcome. I also focus on efforts to evaluate the risk of a surrogate paradox, a damaging situation where the surrogate is positively associated with the outcome, and the causal effect of the treatment on the surrogate is in a helpful direction, but the ultimate causal effect of the treatment on the outcome is harmful. I then review some recent work in robust surrogate marker estimation and conclude with a discussion and suggestions for future research.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":" ","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43592539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Data Privacy: A Song of Privacy and Utility","authors":"Aleksandra Slavković, Jeremy Seeman","doi":"10.1146/annurev-statistics-033121-112921","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033121-112921","url":null,"abstract":"To quantify trade-offs between increasing demand for open data sharing and concerns about sensitive information disclosure, statistical data privacy (SDP) methodology analyzes data release mechanisms that sanitize outputs based on confidential data. Two dominant frameworks exist: statistical disclosure control (SDC) and the more recent differential privacy (DP). Despite framing differences, both SDC and DP share the same statistical problems at their core. For inference problems, either we may design optimal release mechanisms and associated estimators that satisfy bounds on disclosure risk measures, or we may adjust existing sanitized output to create new statistically valid and optimal estimators. Regardless of design or adjustment, in evaluating risk and utility, valid statistical inferences from mechanism outputs require uncertainty quantification that accounts for the effect of the sanitization mechanism that introduces bias and/or variance. In this review, we discuss the statistical foundations common to both SDC and DP, highlight major developments in SDP, and present exciting open research problems in private inference.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136096457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-Based Change-Point Analysis","authors":"Hao Chen, Lynna Chu","doi":"10.1146/annurev-statistics-122121-033817","DOIUrl":"https://doi.org/10.1146/annurev-statistics-122121-033817","url":null,"abstract":"Recent technological advances allow for the collection of massive data in the study of complex phenomena over time and/or space in various fields. Many of these data involve sequences of high-dimensional or non-Euclidean measurements, where change-point analysis is a crucial early step in understanding the data. Segmentation, or offline change-point analysis, divides data into homogeneous temporal or spatial segments, making subsequent analysis easier; its online counterpart detects changes in sequentially observed data, allowing for real-time anomaly detection. This article reviews a nonparametric change-point analysis framework that utilizes graphs representing the similarity between observations. This framework can be applied to data as long as a reasonable dissimilarity distance among the observations can be defined. Thus, this framework can be applied to a wide range of applications, from high-dimensional data to non-Euclidean data, such as imaging data or network data. In addition, analytic formulas can be derived to control the false discoveries, making them easy off-the-shelf data analysis tools.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"1 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43096913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Applications to Cognitive Diagnostic Testing","authors":"Susu Zhang, Jingchen Liu, Zhiliang Ying","doi":"10.1146/annurev-statistics-033021-111803","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033021-111803","url":null,"abstract":"Diagnostic classification tests are designed to assess examinees’ discrete mastery status on a set of skills or attributes. Such tests have gained increasing attention in educational and psychological measurement. We review diagnostic classification models and their applications to testing and learning, discuss their statistical and machine learning connections and related challenges, and introduce some contemporary and future extensions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136096465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Deep Learning for Spatial and Spatiotemporal Data","authors":"Christopher K. Wikle, Andrew Zammit-Mangion","doi":"10.1146/annurev-statistics-033021-112628","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033021-112628","url":null,"abstract":"Deep neural network models have become ubiquitous in recent years and have been applied to nearly all areas of science, engineering, and industry. These models are particularly useful for data that have strong dependencies in space (e.g., images) and time (e.g., sequences). Indeed, deep models have also been extensively used by the statistical community to model spatial and spatiotemporal data through, for example, the use of multilevel Bayesian hierarchical models and deep Gaussian processes. In this review, we first present an overview of traditional statistical and machine learning perspectives for modeling spatial and spatiotemporal data, and then focus on a variety of hybrid models that have recently been developed for latent process, data, and parameter specifications. These hybrid models integrate statistical modeling ideas with deep neural network models in order to take advantage of the strengths of each modeling paradigm. We conclude by giving an overview of computational technologies that have proven useful for these hybrid models, and with a brief discussion on future research directions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"59 18","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50166556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Dimensional Survival Analysis: Methods and Applications.","authors":"Stephen Salerno, Yi Li","doi":"10.1146/annurev-statistics-032921-022127","DOIUrl":"https://doi.org/10.1146/annurev-statistics-032921-022127","url":null,"abstract":"<p><p>In the era of precision medicine, time-to-event outcomes such as time to death or progression are routinely collected, along with high-throughput covariates. These high-dimensional data defy classical survival regression models, which are either infeasible to fit or likely to incur low predictability due to over-fitting. To overcome this, recent emphasis has been placed on developing novel approaches for feature selection and survival prognostication. We will review various cutting-edge methods that handle survival outcome data with high-dimensional predictors, highlighting recent innovations in machine learning approaches for survival prediction. We will cover the statistical intuitions and principles behind these methods and conclude with extensions to more complex settings, where competing events are observed. We exemplify these methods with applications to the Boston Lung Cancer Survival Cohort study, one of the largest cancer epidemiology cohorts investigating the complex mechanisms of lung cancer.</p>","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"10 1","pages":"25-49"},"PeriodicalIF":7.9,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10038209/pdf/nihms-1836646.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10295345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Study of Two Machine Learning Prediction Models Based on Heart Disease Data","authors":"倩 代","doi":"10.12677/sa.2023.124114","DOIUrl":"https://doi.org/10.12677/sa.2023.124114","url":null,"abstract":"","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"36 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86916688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of the Status Quo of Fitness Participation and Its Influencing Factors from the Perspective of Fitness Consumer Satisfaction","authors":"灵灵 吴","doi":"10.12677/sa.2023.124099","DOIUrl":"https://doi.org/10.12677/sa.2023.124099","url":null,"abstract":"","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"47 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74499594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}