论梯度下降法学习的过参数化深度神经网络估计的普遍一致性

IF 0.6 4区数学 Q3 STATISTICS & PROBABILITY

Annals of the Institute of Statistical Mathematics Pub Date : 2024-04-08 DOI:10.1007/s10463-024-00898-6

Selina Drews, Michael Kohler

{"title":"论梯度下降法学习的过参数化深度神经网络估计的普遍一致性","authors":"Selina Drews, Michael Kohler","doi":"10.1007/s10463-024-00898-6","DOIUrl":null,"url":null,"abstract":"<div><p>Estimation of a multivariate regression function from independent and identically distributed data is considered. An estimate is defined which fits a deep neural network consisting of a large number of fully connected neural networks, which are computed in parallel, via gradient descent to the data. The estimate is over-parametrized in the sense that the number of its parameters is much larger than the sample size. It is shown that with a suitable random initialization of the network, a sufficiently small gradient descent step size, and a number of gradient descent steps that slightly exceed the reciprocal of this step size, the estimate is universally consistent. This means that the expected <span>\\(L_2\\)</span> error converges to zero for all distributions of the data where the response variable is square integrable.</p></div>","PeriodicalId":55511,"journal":{"name":"Annals of the Institute of Statistical Mathematics","volume":"76 3","pages":"361 - 391"},"PeriodicalIF":0.6000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent\",\"authors\":\"Selina Drews, Michael Kohler\",\"doi\":\"10.1007/s10463-024-00898-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Estimation of a multivariate regression function from independent and identically distributed data is considered. An estimate is defined which fits a deep neural network consisting of a large number of fully connected neural networks, which are computed in parallel, via gradient descent to the data. The estimate is over-parametrized in the sense that the number of its parameters is much larger than the sample size. It is shown that with a suitable random initialization of the network, a sufficiently small gradient descent step size, and a number of gradient descent steps that slightly exceed the reciprocal of this step size, the estimate is universally consistent. This means that the expected <span>\\\\(L_2\\\\)</span> error converges to zero for all distributions of the data where the response variable is square integrable.</p></div>\",\"PeriodicalId\":55511,\"journal\":{\"name\":\"Annals of the Institute of Statistical Mathematics\",\"volume\":\"76 3\",\"pages\":\"361 - 391\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of the Institute of Statistical Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10463-024-00898-6\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the Institute of Statistical Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s10463-024-00898-6","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

研究考虑了从独立且同分布的数据中估计多元回归函数。我们定义了一个估计值，该估计值与一个深度神经网络相匹配，该网络由大量全连接神经网络组成，通过梯度下降对数据进行并行计算。从参数数量远大于样本量的意义上讲，该估计值是过参数化的。研究表明，如果对网络进行适当的随机初始化，梯度下降的步长足够小，梯度下降的步数略微超过步长的倒数，则估计结果是普遍一致的。这意味着，对于响应变量可平方整数的所有数据分布，预期的 \(L_2\) 误差都会趋近于零。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent

Estimation of a multivariate regression function from independent and identically distributed data is considered. An estimate is defined which fits a deep neural network consisting of a large number of fully connected neural networks, which are computed in parallel, via gradient descent to the data. The estimate is over-parametrized in the sense that the number of its parameters is much larger than the sample size. It is shown that with a suitable random initialization of the network, a sufficiently small gradient descent step size, and a number of gradient descent steps that slightly exceed the reciprocal of this step size, the estimate is universally consistent. This means that the expected \(L_2\) error converges to zero for all distributions of the data where the response variable is square integrable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of the Institute of Statistical Mathematics 数学-统计学与概率论

CiteScore

2.00

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Annals of the Institute of Statistical Mathematics (AISM) aims to provide a forum for open communication among statisticians, and to contribute to the advancement of statistics as a science to enable humans to handle information in order to cope with uncertainties. It publishes high-quality papers that shed new light on the theoretical, computational and/or methodological aspects of statistical science. Emphasis is placed on (a) development of new methodologies motivated by real data, (b) development of unifying theories, and (c) analysis and improvement of existing methodologies and theories.