Infinite-width limit of deep linear neural networks

IF 3.1 1区数学 Q1 MATHEMATICS

Communications on Pure and Applied Mathematics Pub Date : 2024-05-06 DOI:10.1002/cpa.22200

Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli

引用次数: 0

Abstract

This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal $ℓ_{2}$ -norm minimizer of the risk.

Abstract Image

查看原文本刊更多论文

深度线性神经网络的无穷宽极限

本文研究了以随机参数初始化的深度线性神经网络（NN）的无限宽极限。我们发现，当参数数量发散时，训练动态（在精确意义上）会收敛到无限宽确定性线性神经网络的梯度下降动态。此外，即使权重仍然是随机的，我们也能沿着训练动态得到它们的精确规律，并证明了线性预测器在参数数量上的定量收敛结果。最后，我们研究了无限宽线性 NN 的连续时间极限，并证明 NN 的线性预测器以指数速度收敛到风险的最小正态最小化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Communications on Pure and Applied Mathematics 数学-数学

CiteScore

6.70

自引率

3.30%

发文量

审稿时长

>12 weeks

期刊介绍： Communications on Pure and Applied Mathematics (ISSN 0010-3640) is published monthly, one volume per year, by John Wiley & Sons, Inc. © 2019. The journal primarily publishes papers originating at or solicited by the Courant Institute of Mathematical Sciences. It features recent developments in applied mathematics, mathematical physics, and mathematical analysis. The topics include partial differential equations, computer science, and applied mathematics. CPAM is devoted to mathematical contributions to the sciences; both theoretical and applied papers, of original or expository type, are included.