Statistical Science最新文献_第10页

Rejoinder: Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons 复辩手：最佳子集，前进逐步还是拉索？基于广泛比较的分析和建议

IF 5.7 1区数学

Statistical Science Pub Date : 2020-11-01 DOI: 10.1214/20-sts733rej

T. Hastie, R. Tibshirani, R. Tibshirani

引用次数: 5

Modern Variable Selection in Action: Comment on the Papers by HTT and BPV 现代变量选择在行动中——评HTT和BPV的论文

IF 5.7 1区数学

Statistical Science Pub Date : 2020-11-01 DOI: 10.1214/20-sts808

E. George

{"title":"Modern Variable Selection in Action: Comment on the Papers by HTT and BPV","authors":"E. George","doi":"10.1214/20-sts808","DOIUrl":"https://doi.org/10.1214/20-sts808","url":null,"abstract":"Let me begin by congratulating the authors of these two papers, hereafter HTT and BPV, for their superb contributions to the comparisons of methods for variable selection problems in high dimensional regression. The methods considered are truly some of today’s leading contenders for coping with the size and complexity of big data problems of so much current importance. Not surprisingly, there is no clear winner here because the terrain of comparisons is so vast and complex, and no single method can dominate across all situations. The considered setups vary greatly in terms of the number of observations n, the number of predictors p, the number and relative sizes of the underlying nonzero regression coefficients, predictor correlation structures and signal-to-noise ratios (SNRs). And even these only scratch the surface of the infinite possibilities. Further, there is the additional issue as to which performance measure is most important. Is the goal of an analysis exact variable selection or prediction or both? And what about computational speed and scalability? All these considerations would naturally depend on the practical application at hand. The methods compared by HTT and BPV have been unleashed by extraordinary developments in computational speed, and so it is tempting to distinguish them primarily by their novel implementation algorithms. In particular, the recent integer optimization related algorithms for variable selection differ in fundamental ways from the now widely adopted coordinate ascent algorithms for the lasso related methods. Undoubtedly, the impressive improvements in computational speed unleashed by these algorithms are critical for the feasibility of practical applications. However, the more fundamental story behind the performance differences has to do with the differences between the criteria that their algorithms are seeking to optimize. In an important sense, they are being guided by different solutions to the general variable selection problem. Focusing first on the paper of HTT, its main thrust appears to have been kindled by the computational breakthrough of Bertsimas, King and Mazumder (2016) (hereafter BKM), which had proposed a mixed integer opti-","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"609-613"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45250262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Discussion of “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons” 关于“最佳子集、逐步前进还是套索”的讨论基于广泛比较的分析与建议

IF 5.7 1区数学

Statistical Science Pub Date : 2020-11-01 DOI: 10.1214/20-sts807

R. Mazumder

{"title":"Discussion of “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons”","authors":"R. Mazumder","doi":"10.1214/20-sts807","DOIUrl":"https://doi.org/10.1214/20-sts807","url":null,"abstract":"I warmly congratulate the authors Hastie, Tibshirani and Tibshirani (HTT); and Bertsimas, Pauphilet and Van Parys (BPV) for their excellent contributions and important perspectives on sparse regression. Due to space constraints, and my greater familiarity with the content and context of HTT (I have had numerous fruitful discussions with the authors regarding their work), I will focus my discussion on the HTT paper. HTT nicely articulate the relative merits of three canonical estimators in sparse regression: L0, L1 and (forward)stepwise selection. I am humbled that a premise of their work is an article I wrote with Bertsimas and King [4] (BKM). BKM showed that current Mixed Integer Optimization (MIO) algorithms allow us to compute best subsets solutions for problem instances (p ≈ 1000 features) much larger than a previous benchmark (software for best subsets in the R package leaps) that could only handle instances with p ≈ 30. HTT by extending and refining the experiments performed by BKM, have helped clarify and deepen our understanding of L0, L1 and stepwise regression. They raise several intriguing questions that perhaps deserve further attention from the wider statistics and optimization communities. In this commentary, I will focus on some of the key points discussed in HTT, with a bias toward some of the recent work I have been involved in. There is a large and rich body of work in high-dimensional statistics and related optimization techniques that I will not be able to discuss within the limited scope of my commentary.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":"35 1","pages":"602-608"},"PeriodicalIF":5.7,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47846338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Conversation with J. Stuart (Stu) Hunter 与j·斯图尔特·亨特的对话

IF 5.7 1区数学

Statistical Science Pub Date : 2020-11-01 DOI: 10.1214/19-sts766

R. D. Veaux

引用次数: 0

Rejoinder: Sparse Regression: Scalable Algorithms and Empirical Performance 反驳:稀疏回归:可扩展算法和经验性能

IF 5.7 1区数学

Statistical Science Pub Date : 2020-11-01 DOI: 10.1214/20-sts701rej

D. Bertsimas, J. Pauphilet, Bart P. G. Van Parys

引用次数: 3

Parameter Restrictions for the Sake of Identification: Is There Utility in Asserting That Perhaps a Restriction Holds? 出于识别的参数限制：断言可能存在限制是否有用？

IF 5.7 1区数学

Statistical Science Pub Date : 2020-09-25 DOI: 10.1214/23-sts885

P. Gustafson

{"title":"Parameter Restrictions for the Sake of Identification: Is There Utility in Asserting That Perhaps a Restriction Holds?","authors":"P. Gustafson","doi":"10.1214/23-sts885","DOIUrl":"https://doi.org/10.1214/23-sts885","url":null,"abstract":"Statistical modeling can involve a tension between assumptions and statistical identification. The law of the observable data may not uniquely determine the value of a target parameter without invoking a key assumption, and, while plausible, this assumption may not be obviously true in the scientific context at hand. Moreover, there are many instances of key assumptions which are untestable, hence we cannot rely on the data to resolve the question of whether the target is legitimately identified. Working in the Bayesian paradigm, we consider the grey zone of situations where a key assumption, in the form of a parameter space restriction, is scientifically reasonable but not incontrovertible for the problem being tackled. Specifically, we investigate statistical properties that ensue if we structure a prior distribution to assert that `maybe' or `perhaps' the assumption holds. Technically this simply devolves to using a mixture prior distribution putting just some prior weight on the assumption, or one of several assumptions, holding. However, while the construct is straightforward, there is very little literature discussing situations where Bayesian model averaging is employed across a mix of fully identified and partially identified models.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2020-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48381504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Identification of Causal Effects Within Principal Strata Using Auxiliary Variables 利用辅助变量识别主地层内的因果关系

IF 5.7 1区数学

Statistical Science Pub Date : 2020-08-06 DOI: 10.1214/20-sts810

Zhichao Jiang, Peng Ding

{"title":"Identification of Causal Effects Within Principal Strata Using Auxiliary Variables","authors":"Zhichao Jiang, Peng Ding","doi":"10.1214/20-sts810","DOIUrl":"https://doi.org/10.1214/20-sts810","url":null,"abstract":"In causal inference, principal stratification is a framework for dealing with a posttreatment intermediate variable between a treatment and an outcome, in which the principal strata are defined by the joint potential values of the intermediate variable. Because the principal strata are not fully observable, the causal effects within them, also known as the principal causal effects, are not identifiable without additional assumptions. Several previous empirical studies leveraged auxiliary variables to improve the inference of principal causal effects. We establish a general theory for identification and estimation of the principal causal effects with auxiliary variables, which provides a solid foundation for statistical inference and more insights for model building in empirical research. In particular, we consider two commonly-used strategies for principal stratification problems: principal ignorability, and the conditional independence between the auxiliary variable and the outcome given principal strata and covariates. For these two strategies, we give non-parametric and semi-parametric identification results without modeling assumptions on the outcome. When the assumptions for neither strategies are plausible, we propose a large class of flexible parametric and semi-parametric models for identifying principal causal effects. Our theory not only ensures formal identification results of several models that have been used in previous empirical studies but also generalizes them to allow for different types of outcomes and intermediate variables.","PeriodicalId":51172,"journal":{"name":"Statistical Science","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2020-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48107960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Comment: Diagnostics and Kernel-based Extensions for Linear Mixed Effects Models with Endogenous Covariates 评论：具有内生协变量的线性混合效应模型的诊断和基于核的扩展

IF 5.7 1区数学

Statistical Science Pub Date : 2020-08-01 DOI: 10.1214/20-sts782

Hunyong Cho, Joshua P. Zitovsky, Xinyi Li, Minxin Lu, K. Shah, John Sperger, Matthew C. B. Tsilimigras, M. Kosorok

引用次数: 0

Comment: On the Potential for Misuse of Outcome-Wide Study Designs, and Ways to Prevent It 评论：关于滥用结果范围研究设计的可能性以及预防方法

IF 5.7 1区数学

Statistical Science Pub Date : 2020-08-01 DOI: 10.1214/20-sts769

S. Vansteelandt, O. Dukes

引用次数: 2

Comment: Matching Methods for Observational Studies Derived from Large Administrative Databases 评论：大型行政数据库观测研究的匹配方法