Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J Tibshirani
{"title":"SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION.","authors":"Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J Tibshirani","doi":"10.1214/21-aos2133","DOIUrl":null,"url":null,"abstract":"<p><p>Interpolators-estimators that achieve zero training error-have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum <i>ℓ</i> <sub>2</sub> norm (\"ridgeless\") interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters <i>p</i> is of the same order as the number of samples <i>n</i>. We consider two different models for the feature distribution: a linear model, where the feature vectors <math> <mrow><msub><mi>x</mi> <mi>i</mi></msub> <mo>∈</mo> <msup><mi>ℝ</mi> <mi>p</mi></msup> </mrow> </math> are obtained by applying a linear transform to a vector of i.i.d. entries, <i>x</i> <sub><i>i</i></sub> = Σ<sup>1/2</sup> <i>z</i> <sub><i>i</i></sub> (with <math> <mrow><msub><mi>z</mi> <mi>i</mi></msub> <mo>∈</mo> <msup><mi>ℝ</mi> <mi>p</mi></msup> </mrow> </math> ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, <i>x<sub>i</sub></i> = <i>φ</i>(<i>Wz</i> <sub><i>i</i></sub> ) (with <math> <mrow><msub><mi>z</mi> <mi>i</mi></msub> <mo>∈</mo> <msup><mi>ℝ</mi> <mi>d</mi></msup> </mrow> </math> , <math><mrow><mi>W</mi> <mo>∈</mo> <msup><mi>ℝ</mi> <mrow><mi>p</mi> <mo>×</mo> <mi>d</mi></mrow> </msup> </mrow> </math> a matrix of i.i.d. entries, and <i>φ</i> an activation function acting componentwise on <i>Wz</i> <sub><i>i</i></sub> ). We recover-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the \"double descent\" behavior of the prediction risk, and the potential benefits of overparametrization.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9481183/pdf/nihms-1830540.pdf","citationCount":"579","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/21-aos2133","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/4/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 579
Abstract
Interpolators-estimators that achieve zero training error-have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ2 norm ("ridgeless") interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors are obtained by applying a linear transform to a vector of i.i.d. entries, xi = Σ1/2zi (with ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi = φ(Wzi ) (with , a matrix of i.i.d. entries, and φ an activation function acting componentwise on Wzi ). We recover-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
期刊介绍:
The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.