Shahin Tavakoli, Beatrice Matteo, Davide Pigoli, Eleanor Chodroff, John Coleman, Michele Gubian, Margaret E.L. Renwick, Morgan Sonderegger
{"title":"Statistics in Phonetics","authors":"Shahin Tavakoli, Beatrice Matteo, Davide Pigoli, Eleanor Chodroff, John Coleman, Michele Gubian, Margaret E.L. Renwick, Morgan Sonderegger","doi":"10.1146/annurev-statistics-112723-034642","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034642","url":null,"abstract":"Phonetics is the scientific field concerned with the study of how speech is produced, heard, and perceived. It abounds with data, such as acoustic speech recordings, neuroimaging data, or articulatory data. In this article, we provide an introduction to different areas of phonetics (acoustic phonetics, sociophonetics, speech perception, articulatory phonetics, speech inversion, sound change, and speech technology), an overview of the statistical methods for analyzing their data, and an introduction to the signal processing methods commonly applied to speech recordings. A major transition in the statistical modeling of phonetic data has been the shift from fixed effects to random effects regression models, the modeling of curve data (for instance, via generalized additive mixed models or functional data analysis methods), and the use of Bayesian methods. This shift has been driven in part by the increased focus on large speech corpora in phonetics, which has arisen from machine learning methods such as forced alignment. We conclude by identifying opportunities for future research.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"32 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick J. Laub, Young Lee, Philip K. Pollett, Thomas Taimre
{"title":"Hawkes Models and Their Applications","authors":"Patrick J. Laub, Young Lee, Philip K. Pollett, Thomas Taimre","doi":"10.1146/annurev-statistics-112723-034304","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034304","url":null,"abstract":"The Hawkes process is a model for counting the number of arrivals to a system that exhibits the self-exciting property—that one arrival creates a heightened chance of further arrivals in the near future. The model and its generalizations have been applied in a plethora of disparate domains, though two particularly developed applications are in seismology and in finance. As the original model is elegantly simple, generalizations have been proposed that track marks for each arrival, are multivariate, have a spatial component, are driven by renewal processes, treat time as discrete, and so on. This article creates a cohesive review of the traditional Hawkes model and the modern generalizations, providing details on their construction and simulation algorithms, and giving key references to the appropriate literature for a detailed treatment.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"58 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142363011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyunseung Kang, Zijian Guo, Zhonghua Liu, Dylan Small
{"title":"Identification and Inference with Invalid Instruments","authors":"Hyunseung Kang, Zijian Guo, Zhonghua Liu, Dylan Small","doi":"10.1146/annurev-statistics-112723-034721","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034721","url":null,"abstract":"Instrumental variables (IVs) are widely used to study the causal effect of an exposure on an outcome in the presence of unmeasured confounding. IVs require an instrument, a variable that (a) is associated with the exposure, (b) has no direct effect on the outcome except through the exposure, and (c) is not related to unmeasured confounders. Unfortunately, finding variables that satisfy conditions b or c can be challenging in practice. This article reviews works where instruments may not satisfy conditions b or c, which we refer to as invalid instruments. We review identification and inference under different violations of b or c, specifically under linear models, nonlinear models, and heteroskedastic models. We conclude with an empirical comparison of various methods by reanalyzing the effect of body mass index on systolic blood pressure from the UK Biobank.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"735 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142321065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin A. Lindquist, Bonnie B. Smith, Arunkumar Kannan, Angela Zhao, Brian Caffo
{"title":"Measuring the Functioning Human Brain","authors":"Martin A. Lindquist, Bonnie B. Smith, Arunkumar Kannan, Angela Zhao, Brian Caffo","doi":"10.1146/annurev-statistics-040522-100329","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-100329","url":null,"abstract":"The emergence of functional magnetic resonance imaging (fMRI) marked a significant technological breakthrough in the real-time measurement of the functioning human brain in vivo. In part because of their 4D nature (three spatial dimensions and time), fMRI data have inspired a great deal of statistical development in the past couple of decades to address their unique spatiotemporal properties. This article provides an overview of the current landscape in functional brain measurement, with a particular focus on fMRI, highlighting key developments in the past decade. Furthermore, it looks ahead to the future, discussing unresolved research questions in the community and outlining potential research topics for the future.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"50 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Dimensional Gene–Environment Interaction Analysis","authors":"Mengyun Wu, Yingmeng Li, Shuangge Ma","doi":"10.1146/annurev-statistics-112723-034315","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034315","url":null,"abstract":"Beyond the main genetic and environmental effects, gene–environment (G–E) interactions have been demonstrated to significantly contribute to the development and progression of complex diseases. Published analyses of G–E interactions have primarily used a supervised framework to model both low-dimensional environmental factors and high-dimensional genetic factors in relation to disease outcomes. In this article, we aim to provide a selective review of methodological developments in G–E interaction analysis from a statistical perspective. The three main families of techniques are hypothesis testing, variable selection, and dimension reduction, which lead to three general frameworks: testing-based, estimation-based, and prediction-based. Linear- and nonlinear-effects analysis, fixed- and random-effects analysis, marginal and joint analysis, and Bayesian and frequentist analysis are reviewed to facilitate the conduct of interaction analysis in a wide range of situations with various assumptions and objectives. Statistical properties, computations, applications, and future directions are also discussed.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"28 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Theoretical Review of Modern Robust Statistics","authors":"Po-Ling Loh","doi":"10.1146/annurev-statistics-112723-034446","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034446","url":null,"abstract":"Robust statistics is a fairly mature field that dates back to the early 1960s, with many foundational concepts having been developed in the ensuing decades. However, the field has drawn a new surge of attention in the past decade, largely due to a desire to recast robust statistical principles in the context of high-dimensional statistics. In this article, we begin by reviewing some of the central ideas in classical robust statistics. We then discuss the need for new theory in high dimensions, using recent work in high-dimensional <jats:italic>M</jats:italic>-estimation as an illustrative example. Next, we highlight a variety of interesting recent topics that have drawn a flurry of research activity from both statisticians and theoretical computer scientists, demonstrating the need for further research in robust estimation that embraces new estimation and contamination settings, as well as a greater emphasis on computational tractability in high dimensions.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"15 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142022156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crafting 10 Years of Statistics Explanations: Points of Significance","authors":"Naomi Altman, Martin Krzywinski","doi":"10.1146/annurev-statistics-112723-034555","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034555","url":null,"abstract":"Points of Significance is an ongoing series of short articles about statistics in <jats:italic>Nature Methods</jats:italic> that started in 2013. Its aim is to provide clear explanations of essential concepts in statistics for a nonspecialist audience. The articles favor heuristic explanations and make extensive use of simulated examples and graphical explanations, while maintaining mathematical rigor. Topics range from basic, but often misunderstood, such as uncertainty and <jats:italic>p</jats:italic>-values, to relatively advanced, but often neglected, such as the error-in-variables problem and the curse of dimensionality. More recent articles have focused on timely topics such as modeling of epidemics, machine learning, and neural networks. In this article, we discuss the evolution of topics and details behind some of the story arcs, our approach to crafting statistical explanations and narratives, and our use of figures and numerical simulations as props for building understanding.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"13 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142022040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Susan M. Paddock, Carolina Franco, F. Jay Breidt, Brenda Betancourt
{"title":"Statistical Data Integration for Health Policy Evidence-Building","authors":"Susan M. Paddock, Carolina Franco, F. Jay Breidt, Brenda Betancourt","doi":"10.1146/annurev-statistics-112723-034507","DOIUrl":"https://doi.org/10.1146/annurev-statistics-112723-034507","url":null,"abstract":"Health policy evidence-building requires data sources such as health care claims, electronic health records, probability and nonprobability survey data, epidemiological surveillance databases, administrative data, and more, all of which have strengths and limitations for a given policy analysis. Data integration techniques leverage the relative strengths of input sources to obtain a blended source that is richer, more informative, and more fit for use than any single input component. This review notes the expansion of opportunities to use data integration for health policy analyses, reviews key methodological approaches to expand the number of variables in a data set or to increase the precision of estimates, and provides directions for future research. As data quality improvement motivates data integration, key data quality frameworks are provided to structure assessments of candidate input data sources.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"135 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142007340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convergence Diagnostics for Entity Resolution","authors":"Serge Aleshin-Guendel, Rebecca C. Steorts","doi":"10.1146/annurev-statistics-040522-114848","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-114848","url":null,"abstract":"Entity resolution is the process of merging and removing duplicate records from multiple data sources, often in the absence of unique identifiers. Bayesian models for entity resolution allow one to include a priori information, quantify uncertainty in important applications, and directly estimate a partition of the records. Markov chain Monte Carlo (MCMC) sampling is the primary computational method for approximate posterior inference in this setting, but due to the high dimensionality of the space of partitions, there are no agreed upon standards for diagnosing nonconvergence of MCMC sampling. In this article, we review Bayesian entity resolution, with a focus on the specific challenges that it poses for the convergence of a Markov chain. We review prior methods for convergence diagnostics, discussing their weaknesses. We provide recommendations for using MCMC sampling for Bayesian entity resolution, focusing on the use of modern diagnostics that are commonplace in applied Bayesian statistics. Using simulated data, we find that a commonly used Gibbs sampler performs poorly compared with two alternatives.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"4 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140642691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Role of the Bayes Factor in the Evaluation of Evidence","authors":"Colin Aitken, Franco Taroni, Silvia Bozza","doi":"10.1146/annurev-statistics-040522-101020","DOIUrl":"https://doi.org/10.1146/annurev-statistics-040522-101020","url":null,"abstract":"The use of the Bayes factor as a metric for the assessment of the probative value of forensic scientific evidence is largely supported by recommended standards in different disciplines. The application of Bayesian networks enables the consideration of problems of increasing complexity. The lack of a widespread consensus concerning key aspects of evidence evaluation and interpretation, such as the adequacy of a probabilistic framework for handling uncertainty or the method by which conclusions regarding how the strength of the evidence should be reported to a court, has meant the role of the Bayes factor in the administration of criminal justice has come under increasing challenge in recent years. We review the many advantages the Bayes factor has as an approach to the evaluation and interpretation of evidence.","PeriodicalId":48855,"journal":{"name":"Annual Review of Statistics and Its Application","volume":"19 1","pages":""},"PeriodicalIF":7.9,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140642773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}