{"title":"-Stable convergence of heavy-/light-tailed infinitely wide neural networks","authors":"Paul Jung, Hoileong Lee, Jiho Lee, Hongseok Yang","doi":"10.1017/apr.2023.3","DOIUrl":null,"url":null,"abstract":"\n We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric \n \n \n \n$\\alpha$\n\n \n -stable distribution, where \n \n \n \n$\\alpha\\in(0,2]$\n\n \n may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric \n \n \n \n$\\alpha$\n\n \n -stable distribution having the same \n \n \n \n$\\alpha$\n\n \n parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric \n \n \n \n$\\alpha$\n\n \n -stable distributions, \n \n \n \n$\\alpha\\in(0,2]$\n\n \n .","PeriodicalId":53160,"journal":{"name":"Advances in Applied Probability","volume":" ","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Applied Probability","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/apr.2023.3","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric
$\alpha$
-stable distribution, where
$\alpha\in(0,2]$
may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric
$\alpha$
-stable distribution having the same
$\alpha$
parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric
$\alpha$
-stable distributions,
$\alpha\in(0,2]$
.
期刊介绍:
The Advances in Applied Probability has been published by the Applied Probability Trust for over four decades, and is a companion publication to the Journal of Applied Probability. It contains mathematical and scientific papers of interest to applied probabilists, with emphasis on applications in a broad spectrum of disciplines, including the biosciences, operations research, telecommunications, computer science, engineering, epidemiology, financial mathematics, the physical and social sciences, and any field where stochastic modeling is used.
A submission to Applied Probability represents a submission that may, at the Editor-in-Chief’s discretion, appear in either the Journal of Applied Probability or the Advances in Applied Probability. Typically, shorter papers appear in the Journal, with longer contributions appearing in the Advances.