固定宽度树状神经网络容量分析 -- 通用激活

Mihailo Stojnic
{"title":"固定宽度树状神经网络容量分析 -- 通用激活","authors":"Mihailo Stojnic","doi":"arxiv-2402.05696","DOIUrl":null,"url":null,"abstract":"We consider the capacity of \\emph{treelike committee machines} (TCM) neural\nnetworks. Relying on Random Duality Theory (RDT), \\cite{Stojnictcmspnncaprdt23}\nrecently introduced a generic framework for their capacity analysis. An upgrade\nbased on the so-called \\emph{partially lifted} RDT (pl RDT) was then presented\nin \\cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the\nnetworks with the most typical, \\emph{sign}, activations. Here, on the other\nhand, we focus on networks with other, more general, types of activations and\nshow that the frameworks of\n\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently\npowerful to enable handling of such scenarios as well. In addition to the\nstandard \\emph{linear} activations, we uncover that particularly convenient\nresults can be obtained for two very commonly used activations, namely, the\n\\emph{quadratic} and \\emph{rectified linear unit (ReLU)} ones. In more concrete\nterms, for each of these activations, we obtain both the RDT and pl RDT based\nmemory capacities upper bound characterization for \\emph{any} given (even)\nnumber of the hidden layer neurons, $d$. In the process, we also uncover the\nfollowing two, rather remarkable, facts: 1) contrary to the common wisdom, both\nsets of results show that the bounding capacity decreases for large $d$ (the\nwidth of the hidden layer) while converging to a constant value; and 2) the\nmaximum bounding capacity is achieved for the networks with precisely\n\\textbf{\\emph{two}} hidden layer neurons! Moreover, the large $d$ converging\nvalues are observed to be in excellent agrement with the statistical physics\nreplica theory based predictions.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fixed width treelike neural networks capacity analysis -- generic activations\",\"authors\":\"Mihailo Stojnic\",\"doi\":\"arxiv-2402.05696\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the capacity of \\\\emph{treelike committee machines} (TCM) neural\\nnetworks. Relying on Random Duality Theory (RDT), \\\\cite{Stojnictcmspnncaprdt23}\\nrecently introduced a generic framework for their capacity analysis. An upgrade\\nbased on the so-called \\\\emph{partially lifted} RDT (pl RDT) was then presented\\nin \\\\cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the\\nnetworks with the most typical, \\\\emph{sign}, activations. Here, on the other\\nhand, we focus on networks with other, more general, types of activations and\\nshow that the frameworks of\\n\\\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently\\npowerful to enable handling of such scenarios as well. In addition to the\\nstandard \\\\emph{linear} activations, we uncover that particularly convenient\\nresults can be obtained for two very commonly used activations, namely, the\\n\\\\emph{quadratic} and \\\\emph{rectified linear unit (ReLU)} ones. In more concrete\\nterms, for each of these activations, we obtain both the RDT and pl RDT based\\nmemory capacities upper bound characterization for \\\\emph{any} given (even)\\nnumber of the hidden layer neurons, $d$. In the process, we also uncover the\\nfollowing two, rather remarkable, facts: 1) contrary to the common wisdom, both\\nsets of results show that the bounding capacity decreases for large $d$ (the\\nwidth of the hidden layer) while converging to a constant value; and 2) the\\nmaximum bounding capacity is achieved for the networks with precisely\\n\\\\textbf{\\\\emph{two}} hidden layer neurons! Moreover, the large $d$ converging\\nvalues are observed to be in excellent agrement with the statistical physics\\nreplica theory based predictions.\",\"PeriodicalId\":501433,\"journal\":{\"name\":\"arXiv - CS - Information Theory\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2402.05696\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.05696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们考虑的是\emph{treelike committee machines}(TCM)神经网络的容量。基于随机对偶理论(Random Duality Theory,RDT),我们最近引入了一个通用框架来分析它们的容量。基于所谓的 "部分提升 "的升级版随后,Stojnictcmspnncapliftedrdt23}介绍了基于所谓的 "部分提升 "RDT(pl RDT)的升级版。这两项研究的重点都是当时具有最典型的激活(emph{sign})的网络。而在这里,我们将重点放在具有其他更普遍的激活类型的网络上,并展示了{Stojnictcmspncaprdt23,Stojnictcmspncapliftedrdt23}的框架有足够的能力来处理这些情况。除了标准的\emph{线性}激活外,我们还发现两个非常常用的激活,即\emph{二次方}和\emph{校正线性单元(ReLU)}激活,可以得到特别方便的结果。更具体地说,对于每一种激活,我们都能得到基于 RDT 和 pl RDT 的内存容量上界特性,用于给定(偶数)隐层神经元数量 $d$的 (emph{any})。在此过程中,我们还发现了以下两个相当显著的事实:1)与常识相反,这两组结果都表明,当 $d$(隐层宽度)较大时,边界容量会减小,同时收敛到一个恒定值;2)最大边界容量是在具有精确的 (textbf{\emph{two})隐层神经元的网络中实现的!此外,观察到的大$d$收敛值与基于统计物理复制理论的预测非常吻合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fixed width treelike neural networks capacity analysis -- generic activations
We consider the capacity of \emph{treelike committee machines} (TCM) neural networks. Relying on Random Duality Theory (RDT), \cite{Stojnictcmspnncaprdt23} recently introduced a generic framework for their capacity analysis. An upgrade based on the so-called \emph{partially lifted} RDT (pl RDT) was then presented in \cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the networks with the most typical, \emph{sign}, activations. Here, on the other hand, we focus on networks with other, more general, types of activations and show that the frameworks of \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently powerful to enable handling of such scenarios as well. In addition to the standard \emph{linear} activations, we uncover that particularly convenient results can be obtained for two very commonly used activations, namely, the \emph{quadratic} and \emph{rectified linear unit (ReLU)} ones. In more concrete terms, for each of these activations, we obtain both the RDT and pl RDT based memory capacities upper bound characterization for \emph{any} given (even) number of the hidden layer neurons, $d$. In the process, we also uncover the following two, rather remarkable, facts: 1) contrary to the common wisdom, both sets of results show that the bounding capacity decreases for large $d$ (the width of the hidden layer) while converging to a constant value; and 2) the maximum bounding capacity is achieved for the networks with precisely \textbf{\emph{two}} hidden layer neurons! Moreover, the large $d$ converging values are observed to be in excellent agrement with the statistical physics replica theory based predictions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信