深度稳定神经网络的无限宽极限:亚线性、线性和超线性激活函数

Trans. Mach. Learn. Res. Pub Date : 2023-04-08 DOI:10.48550/arXiv.2304.04008

Alberto Bordino, S. Favaro, S. Fortini

{"title":"深度稳定神经网络的无限宽极限:亚线性、线性和超线性激活函数","authors":"Alberto Bordino, S. Favaro, S. Fortini","doi":"10.48550/arXiv.2304.04008","DOIUrl":null,"url":null,"abstract":"There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth\"and under the assumption of a ``sequential growth\"of the width over the NN's layers. Here, assuming a ``sequential growth\"of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting.","PeriodicalId":432739,"journal":{"name":"Trans. Mach. Learn. Res.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions\",\"authors\":\"Alberto Bordino, S. Favaro, S. Fortini\",\"doi\":\"10.48550/arXiv.2304.04008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth\\\"and under the assumption of a ``sequential growth\\\"of the width over the NN's layers. Here, assuming a ``sequential growth\\\"of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting.\",\"PeriodicalId\":432739,\"journal\":{\"name\":\"Trans. Mach. Learn. Res.\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Trans. Mach. Learn. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2304.04008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trans. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.04008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

关于深度高斯神经网络(deep Gaussian neural networks, NNs)，即具有高斯分布参数或权重的深度神经网络，以及高斯随机过程的大宽度特性的研究文献越来越多。由于一些经验和理论研究显示了用稳定分布(即重尾分布)取代高斯分布的潜力，本文研究了深度稳定神经网络(即具有稳定分布参数的深度神经网络)的大宽度特性。对于次线性激活函数，最近的一项工作描述了一个合适的重标深度稳定神经网络在稳定随机过程中的无限宽极限，无论是在神经网络层宽度的“联合增长”假设下，还是在神经网络层宽度的“顺序增长”假设下。在这里，假设宽度的“顺序增长”，我们将这种表征推广到一类一般的激活函数，其中包括亚线性、渐近线性和超线性函数。相对于以前的工作，我们的结果依赖于对重尾分布的广义中心极限定理的使用，这允许对深度稳定神经网络的无限宽极限进行有趣的统一处理。我们的研究表明，稳定神经网络的尺度及其无限宽极限的稳定性可能取决于激活函数的选择，这与高斯设置产生了关键差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth"and under the assumption of a ``sequential growth"of the width over the NN's layers. Here, assuming a ``sequential growth"of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Trans. Mach. Learn. Res.

自引率

0.00%

发文量