{"title":"Designs for Vaccine Studies","authors":"M. Elizabeth Halloran","doi":"10.1146/annurev-statistics-033121-120121","DOIUrl":"https://doi.org/10.1146/annurev-statistics-033121-120121","url":null,"abstract":"Due to dependent happenings, vaccines can have different effects in populations. In addition to direct protective effects in the vaccinated, vaccination in a population can have indirect effects in the unvaccinated individuals. Vaccination can also reduce person-to-person transmission to vaccinated individuals or from vaccinated individuals compared with unvaccinated individuals. Design of vaccine studies has a history extending back over a century. Emerging infectious diseases, such as the SARS-CoV-2 pandemic and the Ebola outbreak in West Africa, have stimulated new interest in vaccine studies. We focus on some recent developments, such as target trial emulation, test-negative design, and regression discontinuity design. Methods for evaluating durability of vaccine effects were developed in the context of both blinded and unblinded placebo crossover studies. The case-ascertained design is used to assess the transmission effects of vaccines. The novel ring vaccination trial design was first used in the Ebola outbreak in West Africa.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"5 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142555733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Statistical Viewpoint on Differential Privacy: Hypothesis Testing, Representation, and Blackwell's Theorem","authors":"Weijie J. Su","doi":"10.1146/annurev-statistics-112723-034158","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034158","url":null,"abstract":"Differential privacy is widely considered the formal privacy for privacy-preserving data analysis due to its robust and rigorous guarantees, with increasingly broad adoption in public services, academia, and industry. Although differential privacy originated in the cryptographic context, in this review we argue that, fundamentally, it can be considered a pure statistical concept. We leverage Blackwell's informativeness theorem and focus on demonstrating that the definition of differential privacy can be formally motivated from a hypothesis testing perspective, thereby showing that hypothesis testing is not merely convenient but also the right language for reasoning about differential privacy. This insight leads to the definition of f-differential privacy, which extends other differential privacy definitions through a representation theorem. We review techniques that render f-differential privacy a unified framework for analyzing privacy bounds in data analysis and machine learning. Applications of this differential privacy definition to private deep learning, private convex optimization, shuffled mechanisms, and US Census data are discussed to highlight the benefits of analyzing privacy bounds under this framework compared with existing alternatives.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"55 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142449536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reproducibility in the Classroom","authors":"Mine Dogucu","doi":"10.1146/annurev-statistics-112723-034436","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034436","url":null,"abstract":"Difficulties in reproducing results from scientific studies have lately been referred to as a reproducibility crisis. Scientific practice depends heavily on scientific training. What gets taught in the classroom is often practiced in labs, fields, and data analysis. The importance of reproducibility in the classroom has gained momentum in statistics education in recent years. In this article, we review the existing literature on reproducibility education. We delve into the relationship between computing tools and reproducibility through visiting historical developments in this area. We share examples for teaching reproducibility and reproducible teaching while discussing the pedagogical opportunities created by these examples as well as challenges that the instructors should be aware of. We detail the use of teaching reproducibility and reproducible teaching practices in an introductory data science course. Lastly, we provide recommendations on reproducibility education for instructors, administrators, and other members of the scientific community.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"2 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142398006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Additive Models","authors":"Simon N. Wood","doi":"10.1146/annurev-statistics-112723-034249","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034249","url":null,"abstract":"Generalized additive models are generalized linear models in which the linear predictor includes a sum of smooth functions of covariates, where the shape of the functions is to be estimated. They have also been generalized beyond the original generalized linear model setting to distributions outside the exponential family and to situations in which multiple parameters of the response distribution may depend on sums of smooth functions of covariates. The widely used computational and inferential framework in which the smooth terms are represented as latent Gaussian processes, splines, or Gaussian random effects is reviewed, paying particular attention to the case in which computational and theoretical tractability is obtained by prior rank reduction of the model terms. An empirical Bayes approach is taken, and its relatively good frequentist performance discussed, along with some more overtly frequentist approaches to model selection. Estimation of the degree of smoothness of component functions via cross validation or marginal likelihood is covered, alongside the computational strategies required in practice, including when data and models are reasonably large. It is briefly shown how the framework extends easily to location-scale modeling, and, with more effort, to techniques such as quantile regression. Also covered are the main classes of smooths of multiple covariates that may be included in models: isotropic splines and tensor product smooth interaction terms.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"39 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142384162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahin Tavakoli, Beatrice Matteo, Davide Pigoli, Eleanor Chodroff, John Coleman, Michele Gubian, Margaret E.L. Renwick, Morgan Sonderegger
{"title":"Statistics in Phonetics","authors":"Shahin Tavakoli, Beatrice Matteo, Davide Pigoli, Eleanor Chodroff, John Coleman, Michele Gubian, Margaret E.L. Renwick, Morgan Sonderegger","doi":"10.1146/annurev-statistics-112723-034642","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034642","url":null,"abstract":"Phonetics is the scientific field concerned with the study of how speech is produced, heard, and perceived. It abounds with data, such as acoustic speech recordings, neuroimaging data, or articulatory data. In this article, we provide an introduction to different areas of phonetics (acoustic phonetics, sociophonetics, speech perception, articulatory phonetics, speech inversion, sound change, and speech technology), an overview of the statistical methods for analyzing their data, and an introduction to the signal processing methods commonly applied to speech recordings. A major transition in the statistical modeling of phonetic data has been the shift from fixed effects to random effects regression models, the modeling of curve data (for instance, via generalized additive mixed models or functional data analysis methods), and the use of Bayesian methods. This shift has been driven in part by the increased focus on large speech corpora in phonetics, which has arisen from machine learning methods such as forced alignment. We conclude by identifying opportunities for future research.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"32 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick J. Laub, Young Lee, Philip K. Pollett, Thomas Taimre
{"title":"Hawkes Models and Their Applications","authors":"Patrick J. Laub, Young Lee, Philip K. Pollett, Thomas Taimre","doi":"10.1146/annurev-statistics-112723-034304","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034304","url":null,"abstract":"The Hawkes process is a model for counting the number of arrivals to a system that exhibits the self-exciting property—that one arrival creates a heightened chance of further arrivals in the near future. The model and its generalizations have been applied in a plethora of disparate domains, though two particularly developed applications are in seismology and in finance. As the original model is elegantly simple, generalizations have been proposed that track marks for each arrival, are multivariate, have a spatial component, are driven by renewal processes, treat time as discrete, and so on. This article creates a cohesive review of the traditional Hawkes model and the modern generalizations, providing details on their construction and simulation algorithms, and giving key references to the appropriate literature for a detailed treatment.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"58 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyunseung Kang, Zijian Guo, Zhonghua Liu, Dylan Small
{"title":"Identification and Inference with Invalid Instruments","authors":"Hyunseung Kang, Zijian Guo, Zhonghua Liu, Dylan Small","doi":"10.1146/annurev-statistics-112723-034721","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034721","url":null,"abstract":"Instrumental variables (IVs) are widely used to study the causal effect of an exposure on an outcome in the presence of unmeasured confounding. IVs require an instrument, a variable that (a) is associated with the exposure, (b) has no direct effect on the outcome except through the exposure, and (c) is not related to unmeasured confounders. Unfortunately, finding variables that satisfy conditions b or c can be challenging in practice. This article reviews works where instruments may not satisfy conditions b or c, which we refer to as invalid instruments. We review identification and inference under different violations of b or c, specifically under linear models, nonlinear models, and heteroskedastic models. We conclude with an empirical comparison of various methods by reanalyzing the effect of body mass index on systolic blood pressure from the UK Biobank.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"735 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin A. Lindquist, Bonnie B. Smith, Arunkumar Kannan, Angela Zhao, Brian Caffo
{"title":"Measuring the Functioning Human Brain","authors":"Martin A. Lindquist, Bonnie B. Smith, Arunkumar Kannan, Angela Zhao, Brian Caffo","doi":"10.1146/annurev-statistics-040522-100329","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-100329","url":null,"abstract":"The emergence of functional magnetic resonance imaging (fMRI) marked a significant technological breakthrough in the real-time measurement of the functioning human brain in vivo. In part because of their 4D nature (three spatial dimensions and time), fMRI data have inspired a great deal of statistical development in the past couple of decades to address their unique spatiotemporal properties. This article provides an overview of the current landscape in functional brain measurement, with a particular focus on fMRI, highlighting key developments in the past decade. Furthermore, it looks ahead to the future, discussing unresolved research questions in the community and outlining potential research topics for the future.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"50 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Dimensional Gene–Environment Interaction Analysis","authors":"Mengyun Wu, Yingmeng Li, Shuangge Ma","doi":"10.1146/annurev-statistics-112723-034315","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034315","url":null,"abstract":"Beyond the main genetic and environmental effects, gene–environment (G–E) interactions have been demonstrated to significantly contribute to the development and progression of complex diseases. Published analyses of G–E interactions have primarily used a supervised framework to model both low-dimensional environmental factors and high-dimensional genetic factors in relation to disease outcomes. In this article, we aim to provide a selective review of methodological developments in G–E interaction analysis from a statistical perspective. The three main families of techniques are hypothesis testing, variable selection, and dimension reduction, which lead to three general frameworks: testing-based, estimation-based, and prediction-based. Linear- and nonlinear-effects analysis, fixed- and random-effects analysis, marginal and joint analysis, and Bayesian and frequentist analysis are reviewed to facilitate the conduct of interaction analysis in a wide range of situations with various assumptions and objectives. Statistical properties, computations, applications, and future directions are also discussed.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"28 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Theoretical Review of Modern Robust Statistics","authors":"Po-Ling Loh","doi":"10.1146/annurev-statistics-112723-034446","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034446","url":null,"abstract":"Robust statistics is a fairly mature field that dates back to the early 1960s, with many foundational concepts having been developed in the ensuing decades. However, the field has drawn a new surge of attention in the past decade, largely due to a desire to recast robust statistical principles in the context of high-dimensional statistics. In this article, we begin by reviewing some of the central ideas in classical robust statistics. We then discuss the need for new theory in high dimensions, using recent work in high-dimensional <jats:italic>M</jats:italic>-estimation as an illustrative example. Next, we highlight a variety of interesting recent topics that have drawn a flurry of research activity from both statisticians and theoretical computer scientists, demonstrating the need for further research in robust estimation that embraces new estimation and contamination settings, as well as a greater emphasis on computational tractability in high dimensions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"15 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142022156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}