{"title":"Robust high dimensional factor models with applications to statistical machine learning.","authors":"Jianqing Fan, Kaizheng Wang, Yiqiao Zhong, Ziwei Zhu","doi":"10.1214/20-sts785","DOIUrl":"10.1214/20-sts785","url":null,"abstract":"<p><p>Factor models are a class of powerful statistical models that have been widely used to deal with dependent measurements that arise frequently from various applications from genomics and neuroscience to economics and finance. As data are collected at an ever-growing scale, statistical machine learning faces some new challenges: high dimensionality, strong dependence among observed variables, heavy-tailed variables and heterogeneity. High-dimensional robust factor analysis serves as a powerful toolkit to conquer these challenges. This paper gives a selective overview on recent advance on high-dimensional factor models and their applications to statistics including Factor-Adjusted Robust Model selection (FarmSelect) and Factor-Adjusted Robust Multiple testing (FarmTest). We show that classical methods, especially principal component analysis (PCA), can be tailored to many new problems and provide powerful tools for statistical estimation and inference. We highlight PCA and its connections to matrix perturbation theory, robust statistics, random projection, false discovery rate, etc., and illustrate through several applications how insights from these fields yield solutions to modern challenges. We also present far-reaching connections between factor models and popular statistical learning problems, including network analysis and low-rank matrix recovery.</p>","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"36 2","pages":"303-327"},"PeriodicalIF":3.9,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8315369/pdf/nihms-1639567.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39254018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comment: On the History and Limitations of Probability Updating","authors":"G. Shafer","doi":"10.1214/21-STS765A","DOIUrl":"https://doi.org/10.1214/21-STS765A","url":null,"abstract":"Gong and Meng show that we can gain insights into classical paradoxes about conditional probability by acknowledging that apparently precise probabilities live within a larger world of imprecise probability. They also show that the notion of updating becomes problematic in this larger world. A closer look at the historical development of the notion of updating can give us further insights into its limitations.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42864081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comment: Moving Beyond Sets of Probabilities","authors":"G. Wheeler","doi":"10.1214/21-STS765C","DOIUrl":"https://doi.org/10.1214/21-STS765C","url":null,"abstract":"The theory of lower previsions is designed around the principles of coherence and sure-loss avoidance, thus steers clear of all the updating anomalies highlighted in Gong and Meng’s “Judicious Judgment Meets Unsettling Updating: Dilation, Sure Loss and Simpson’s Paradox” except dilation. In fact, the traditional problem with the theory of imprecise probability is that coherent inference is too complicated rather than unsettling. Progress has been made simplifying coherent inference by demoting sets of probabilities from fundamental building blocks to secondary representations that are derived or discarded as needed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45118344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic Approximation: From Statistical Origin to Big-Data, Multidisciplinary Applications","authors":"T. Lai, Hongsong Yuan","doi":"10.1214/20-STS784","DOIUrl":"https://doi.org/10.1214/20-STS784","url":null,"abstract":"Stochastic approximation was introduced in 1951 to provide a new theoretical framework for root finding and optimization of a regression function in the then-nascent field of statistics. This review shows how it has evolved in response to other developments in statistics, notably time series and sequential analysis, and to applications in artificial intelligence, economics, and engineering. Its resurgence in the Big Data Era has led to new advances in both theory and applications of this microcosm of statistics and data science.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43506417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noncommutative Probability and Multiplicative Cascades","authors":"I. McKeague","doi":"10.1214/20-STS780","DOIUrl":"https://doi.org/10.1214/20-STS780","url":null,"abstract":"Various aspects of standard model particle physics might be explained by a suitably rich algebra acting on itself, as suggested by Furey (2015). The present paper develops the asymptotics of large causal tree diagrams that combine freely independent elements in such an algebra. The Marčenko–Pastur law and Wigner’s semicircle law are shown to emerge as limits of normalized sum-over-paths of nonnegative elements assigned to the edges of causal trees. These results are established in the setting of noncommutative probability. Trees with classically independent positive edge weights (random multiplicative cascades) were originally proposed by Mandelbrot as a model displaying the fractal features of turbulence. The novelty of the present work is the use of noncommutative (free) probability to allow the edge weights to take values in an algebra. An application to theoretical neuroscience is also discussed.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43723597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comment: On Focusing, Soft and Strong Revision of Choquet Capacities and Their Role in Statistics","authors":"Thomas Augustin, G. Schollmeyer","doi":"10.1214/21-STS765D","DOIUrl":"https://doi.org/10.1214/21-STS765D","url":null,"abstract":"We congratulate Ruobin Gong and Xiao-Li Meng on their thought-provoking paper demonstrating the power of imprecise probabilities in statistics. In particular, Gong and Meng clarify important statistical paradoxes by discussing them in the framework of generalized uncertainty quantification and different conditioning rules used for updating. In this note, we characterize all three conditioning rules as envelopes of certain sets of conditional probabilities. This view also suggests some generalizations that can be seen as compromise rules. Similar to Gong and Meng, our derivations mainly focus on Choquet capacities of order 2, and so we also briefly discuss in general their role as statistical models. We conclude with some general remarks on the potential of imprecise probabilities to cope with the multidimensional nature of uncertainty.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41573357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rejoinder: Let’s Be Imprecise in Order to Be Precise (About What We Don’t Know)","authors":"Ruobin Gong, X. Meng","doi":"10.1214/21-STS765REJ","DOIUrl":"https://doi.org/10.1214/21-STS765REJ","url":null,"abstract":"Preparing a rejoinder is a typically rewarding, sometimes depressing, and occasionally frustrating experience. The rewarding part is self-evident, and the depression sets in when a discussant has much deeper and crisper insights about the authors’ thesis than authors themselves. Frustrations arise when the authors thought they made some points crystal clear, but the reflections from the discussants show a very different picture. We are deeply grateful to the editors of Statistical Science and the discussants for providing us an opportunity to maximize the first, sample the second, and minimize the third.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49551625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Bura, Bing Li, Lexin Li, C. Nachtsheim, D. Peña, C. Setodji, R. Weiss
{"title":"A Conversation with Dennis Cook","authors":"E. Bura, Bing Li, Lexin Li, C. Nachtsheim, D. Peña, C. Setodji, R. Weiss","doi":"10.1214/20-STS801","DOIUrl":"https://doi.org/10.1214/20-STS801","url":null,"abstract":"Dennis Cook is a Full Professor, School of Statistics, at the University of Minnesota. He received his BS degree in Mathematics from Northern Montana College, and MS and PhD degrees in Statistics from Kansas State University. He has served as Chair of the Department of Applied Statistics, Director of the Statistical Center and Director of the School of Statistics, all at the University of Minnesota.\u0000His research areas include dimension reduction, linear and nonlinear regression, experimental design, statistical diagnostics, statistical graphics and population genetics. He has authored over 200 research articles and is author or co-author of two textbooks— An Introduction to Regression Graphics and Applied Regression Including Computing and Graphics—and three research monographs, Influence and Residuals in Regression, Regression Graphics: Ideas for Studying Regressions through Graphics and An Introduction to Envelopes: Dimension Reduction for Efficient Estimation in Multivariate Statistics.\u0000He has served as Associate Editor of the Journal of the American Statistical Association, The Journal of Quality Technology, Biometrika, Journal of the Royal Statistical Society and Statistica Sinica. He is a four-time recipient of the Jack Youden Prize for Best Expository Paper in Technometrics as well as the Frank Wilcoxon Award for Best Technical Paper. He received the 2005 COPSS Fisher Lecture and Award, and he is a Fellow of the American Statistical Association and the Institute of Mathematical Statistics. The following conversation took place on March 22, 2019, following the banquet at the conference, “Cook’s Distance and Beyond: A Conference Celebrating the Contributions of R. Dennis Cook.” The interviewers were, Efstathia Bura (Effie), Bing Li, Lexin Li, Christopher Nachtsheim (Chris), Daniel Pena, Claude Messan Setodji and Robert Weiss (Rob).","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"36 1","pages":"328-337"},"PeriodicalIF":5.7,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46070211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y X Rachel Wang, Lexin Li, Jingyi Jessica Li, Haiyan Huang
{"title":"Network Modeling in Biology: Statistical Methods for Gene and Brain Networks.","authors":"Y X Rachel Wang, Lexin Li, Jingyi Jessica Li, Haiyan Huang","doi":"10.1214/20-sts792","DOIUrl":"10.1214/20-sts792","url":null,"abstract":"<p><p>The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.</p>","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"36 1","pages":"89-108"},"PeriodicalIF":3.9,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8296984/pdf/nihms-1636819.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39219268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}