{"title":"GRID: A variable selection and structure discovery method for high dimensional nonparametric regression","authors":"F. Giordano, S. Lahiri, M. L. Parrella","doi":"10.1214/19-aos1846","DOIUrl":"https://doi.org/10.1214/19-aos1846","url":null,"abstract":"We consider nonparametric regression in high dimensions where only a relatively small subset of a large number of variables are relevant and may have nonlinear effects on the response. We develop methods for variable selection, structure discovery and estimation of the true low-dimensional regression function, allowing any degree of interactions among the relevant variables that need not be specified a-priori. The proposed method, called the GRID, combines empirical likelihood based marginal testing with the local linear estimation machinery in a novel way to select the relevant variables. Further, it provides a simple graphical tool for identifying the low dimensional nonlinear structure of the regression function. Theoretical results establish consistency of variable selection and structure discovery, and also Oracle risk property of the GRID estimator of the regression function, allowing the dimension d of the covariates to grow with the sample size n at the rate d = O(n) for any a ∈ (0,∞) and the number of relevant covariates r to grow at a rate r = O(n) for some γ ∈ (0, 1) under some regularity conditions that, in particular, require finiteness of certain absolute moments of the error variables depending on a. Finite sample properties of the GRID are investigated in a moderately large simulation study.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 1","pages":"1848-1874"},"PeriodicalIF":4.5,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48408022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ENTRYWISE EIGENVECTOR ANALYSIS OF RANDOM MATRICES WITH LOW EXPECTED RANK.","authors":"Emmanuel Abbe, Jianqing Fan, Kaizheng Wang, Yiqiao Zhong","doi":"10.1214/19-aos1854","DOIUrl":"10.1214/19-aos1854","url":null,"abstract":"<p><p>Recovering low-rank structures via eigenvector perturbation analysis is a common problem in statistical machine learning, such as in factor analysis, community detection, ranking, matrix completion, among others. While a large variety of bounds are available for average errors between empirical and population statistics of eigenvectors, few results are tight for entrywise analyses, which are critical for a number of problems such as community detection. This paper investigates entrywise behaviors of eigenvectors for a large class of random matrices whose expectations are low-rank, which helps settle the conjecture in Abbe et al. (2014b) that the spectral algorithm achieves exact recovery in the stochastic block model without any trimming or cleaning steps. The key is a first-order approximation of eigenvectors under the <i>ℓ</i> <sub>∞</sub> norm: <dispformula> <math> <mrow><msub><mi>u</mi> <mi>k</mi></msub> <mo>≈</mo> <mfrac><mrow><mi>A</mi> <msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> <mrow><msubsup><mi>λ</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> </mfrac> <mo>,</mo></mrow> </math> </dispformula> where {<i>u</i> <sub><i>k</i></sub> } and <math> <mrow><mrow><mo>{</mo> <mrow><msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> <mo>}</mo></mrow> </mrow> </math> are eigenvectors of a random matrix <i>A</i> and its expectation <math><mrow><mi>E</mi> <mi>A</mi></mrow> </math> , respectively. The fact that the approximation is both tight and linear in <i>A</i> facilitates sharp comparisons between <i>u</i> <sub><i>k</i></sub> and <math> <mrow><msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> </math> . In particular, it allows for comparing the signs of <i>u</i> <sub><i>k</i></sub> and <math> <mrow><msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> </math> even if <math> <mrow> <msub> <mrow><mrow><mo>‖</mo> <mrow><msub><mi>u</mi> <mi>k</mi></msub> <mo>-</mo> <msubsup><mi>u</mi> <mi>k</mi> <mo>*</mo></msubsup> </mrow> <mo>‖</mo></mrow> </mrow> <mi>∞</mi></msub> </mrow> </math> is large. The results are further extended to perturbations of eigenspaces, yielding new <i>ℓ</i> <sub>∞</sub>-type bounds for synchronization ( <math> <mrow><msub><mi>ℤ</mi> <mn>2</mn></msub> </mrow> </math> -spiked Wigner model) and noisy matrix completion.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 3","pages":"1452-1474"},"PeriodicalIF":4.5,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8046180/pdf/nihms-1053828.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38877757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2020-04-01Epub Date: 2020-05-26DOI: 10.1214/19-aos1835
Ted Westling, Marco Carone
{"title":"A UNIFIED STUDY OF NONPARAMETRIC INFERENCE FOR MONOTONE FUNCTIONS.","authors":"Ted Westling, Marco Carone","doi":"10.1214/19-aos1835","DOIUrl":"10.1214/19-aos1835","url":null,"abstract":"<p><p>The problem of nonparametric inference on a monotone function has been extensively studied in many particular cases. Estimators considered have often been of so-called Grenander type, being representable as the left derivative of the greatest convex minorant or least concave majorant of an estimator of a primitive function. In this paper, we provide general conditions for consistency and pointwise convergence in distribution of a class of generalized Grenander-type estimators of a monotone function. This broad class allows the minorization or majoratization operation to be performed on a data-dependent transformation of the domain, possibly yielding benefits in practice. Additionally, we provide simpler conditions and more concrete distributional theory in the important case that the primitive estimator and data-dependent transformation function are asymptotically linear. We use our general results in the context of various well-studied problems, and show that we readily recover classical results established separately in each case. More importantly, we show that our results allow us to tackle more challenging problems involving parameters for which the use of flexible learning strategies appears necessary. In particular, we study inference on monotone density and hazard functions using informatively right-censored data, extending the classical work on independent censoring, and on a covariate-marginalized conditional mean function, extending the classical work on monotone regression functions.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 2","pages":"1001-1024"},"PeriodicalIF":4.5,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7377427/pdf/nihms-1597646.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38194372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Chatelain, Anne-Laure Fougères, J. Nešlehová
{"title":"Inference for Archimax copulas","authors":"Simon Chatelain, Anne-Laure Fougères, J. Nešlehová","doi":"10.1214/19-aos1836","DOIUrl":"https://doi.org/10.1214/19-aos1836","url":null,"abstract":"Archimax copula models can account for any type of asymptotic dependence between extremes and at the same time capture joint risks at medium levels. An Archimax copula is characterized by two functional parameters, the stable tail dependence function `, and the Archimedean generator ψ which distorts the extreme-value dependence structure. This article develops semiparametric inference for Archimax copulas: a nonparametric estimator of ` and a momentbased estimator of ψ assuming the latter belongs to a parametric family. Conditions under which ψ and ` are identifiable are derived. The asymptotic behavior of the estimators is then established under broad regularity conditions; performance in small samples is assessed through a comprehensive simulation study. The Archimax copula model with the Clayton generator is then used to analyze monthly rainfall maxima at three stations in French Brittany. The model is seen to fit the data very well, both in the lower and in the upper tail. The nonparametric estimator of ` reveals asymmetric extremal dependence between the stations, which reflects heavy precipitation patterns in the area. Technical proofs, simulation results and R code are provided in the Online Supplement.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 1","pages":"1025-1051"},"PeriodicalIF":4.5,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44067400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hurst function estimation","authors":"Jinqi Shen, T. Hsing","doi":"10.1214/19-aos1825","DOIUrl":"https://doi.org/10.1214/19-aos1825","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 1","pages":"838-862"},"PeriodicalIF":4.5,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46240529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Pananjady, Cheng Mao, Vidya Muthukumar, M. Wainwright, T. Courtade
{"title":"Worst-case versus average-case design for estimation from partial pairwise comparisons","authors":"A. Pananjady, Cheng Mao, Vidya Muthukumar, M. Wainwright, T. Courtade","doi":"10.1214/19-aos1838","DOIUrl":"https://doi.org/10.1214/19-aos1838","url":null,"abstract":"Pairwise comparison data arises in many domains, including tournament rankings, web search, and preference elicitation. Given noisy comparisons of a fixed subset of pairs of items, we study the problem of estimating the underlying comparison probabilities under the assumption of strong stochastic transitivity (SST). We also consider the noisy sorting subclass of the SST model. We show that when the assignment of items to the topology is arbitrary, these permutationbased models, unlike their parametric counterparts, do not admit consistent estimation for most comparison topologies used in practice. We then demonstrate that consistent estimation is possible when the assignment of items to the topology is randomized, thus establishing a dichotomy between worst-case and average-case designs. We propose two computationally efficient estimators in the average-case setting and analyze their risk, showing that it depends on the comparison topology only through the degree sequence of the topology. We also provide explicit classes of graphs for which the rates achieved by these estimators are optimal. Our results are corroborated by simulations on multiple comparison topologies.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 1","pages":"1072-1097"},"PeriodicalIF":4.5,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49323852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction error after model search","authors":"Xiaoying Tian","doi":"10.1214/19-AOS1818","DOIUrl":"https://doi.org/10.1214/19-AOS1818","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 1","pages":"763-784"},"PeriodicalIF":4.5,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45252274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bootstrap confidence regions based on M-estimators under nonstandard conditions","authors":"Stephen M. S. Lee, Puyudi Yang","doi":"10.1214/18-aos1803","DOIUrl":"https://doi.org/10.1214/18-aos1803","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 1","pages":"274-299"},"PeriodicalIF":4.5,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45596271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-step semiparametric empirical likelihood inference","authors":"Francesco Bravo, J. Escanciano, I. Keilegom","doi":"10.1214/18-AOS1788","DOIUrl":"https://doi.org/10.1214/18-AOS1788","url":null,"abstract":"","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"48 1","pages":"1-26"},"PeriodicalIF":4.5,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46503774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}