Fixed width treelike neural networks capacity analysis -- generic activations

Mihailo Stojnic
{"title":"Fixed width treelike neural networks capacity analysis -- generic activations","authors":"Mihailo Stojnic","doi":"arxiv-2402.05696","DOIUrl":null,"url":null,"abstract":"We consider the capacity of \\emph{treelike committee machines} (TCM) neural\nnetworks. Relying on Random Duality Theory (RDT), \\cite{Stojnictcmspnncaprdt23}\nrecently introduced a generic framework for their capacity analysis. An upgrade\nbased on the so-called \\emph{partially lifted} RDT (pl RDT) was then presented\nin \\cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the\nnetworks with the most typical, \\emph{sign}, activations. Here, on the other\nhand, we focus on networks with other, more general, types of activations and\nshow that the frameworks of\n\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently\npowerful to enable handling of such scenarios as well. In addition to the\nstandard \\emph{linear} activations, we uncover that particularly convenient\nresults can be obtained for two very commonly used activations, namely, the\n\\emph{quadratic} and \\emph{rectified linear unit (ReLU)} ones. In more concrete\nterms, for each of these activations, we obtain both the RDT and pl RDT based\nmemory capacities upper bound characterization for \\emph{any} given (even)\nnumber of the hidden layer neurons, $d$. In the process, we also uncover the\nfollowing two, rather remarkable, facts: 1) contrary to the common wisdom, both\nsets of results show that the bounding capacity decreases for large $d$ (the\nwidth of the hidden layer) while converging to a constant value; and 2) the\nmaximum bounding capacity is achieved for the networks with precisely\n\\textbf{\\emph{two}} hidden layer neurons! Moreover, the large $d$ converging\nvalues are observed to be in excellent agrement with the statistical physics\nreplica theory based predictions.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.05696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We consider the capacity of \emph{treelike committee machines} (TCM) neural networks. Relying on Random Duality Theory (RDT), \cite{Stojnictcmspnncaprdt23} recently introduced a generic framework for their capacity analysis. An upgrade based on the so-called \emph{partially lifted} RDT (pl RDT) was then presented in \cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the networks with the most typical, \emph{sign}, activations. Here, on the other hand, we focus on networks with other, more general, types of activations and show that the frameworks of \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently powerful to enable handling of such scenarios as well. In addition to the standard \emph{linear} activations, we uncover that particularly convenient results can be obtained for two very commonly used activations, namely, the \emph{quadratic} and \emph{rectified linear unit (ReLU)} ones. In more concrete terms, for each of these activations, we obtain both the RDT and pl RDT based memory capacities upper bound characterization for \emph{any} given (even) number of the hidden layer neurons, $d$. In the process, we also uncover the following two, rather remarkable, facts: 1) contrary to the common wisdom, both sets of results show that the bounding capacity decreases for large $d$ (the width of the hidden layer) while converging to a constant value; and 2) the maximum bounding capacity is achieved for the networks with precisely \textbf{\emph{two}} hidden layer neurons! Moreover, the large $d$ converging values are observed to be in excellent agrement with the statistical physics replica theory based predictions.
固定宽度树状神经网络容量分析 -- 通用激活
我们考虑的是\emph{treelike committee machines}(TCM)神经网络的容量。基于随机对偶理论(Random Duality Theory,RDT),我们最近引入了一个通用框架来分析它们的容量。基于所谓的 "部分提升 "的升级版随后,Stojnictcmspnncapliftedrdt23}介绍了基于所谓的 "部分提升 "RDT(pl RDT)的升级版。这两项研究的重点都是当时具有最典型的激活(emph{sign})的网络。而在这里,我们将重点放在具有其他更普遍的激活类型的网络上,并展示了{Stojnictcmspncaprdt23,Stojnictcmspncapliftedrdt23}的框架有足够的能力来处理这些情况。除了标准的\emph{线性}激活外,我们还发现两个非常常用的激活,即\emph{二次方}和\emph{校正线性单元(ReLU)}激活,可以得到特别方便的结果。更具体地说,对于每一种激活,我们都能得到基于 RDT 和 pl RDT 的内存容量上界特性,用于给定(偶数)隐层神经元数量 $d$的 (emph{any})。在此过程中,我们还发现了以下两个相当显著的事实:1)与常识相反,这两组结果都表明,当 $d$(隐层宽度)较大时,边界容量会减小,同时收敛到一个恒定值;2)最大边界容量是在具有精确的 (textbf{\emph{two})隐层神经元的网络中实现的!此外,观察到的大$d$收敛值与基于统计物理复制理论的预测非常吻合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信