Fixed width treelike neural networks capacity analysis -- generic activations

arXiv - CS - Information Theory Pub Date : 2024-02-08 DOI:arxiv-2402.05696

Mihailo Stojnic

{"title":"Fixed width treelike neural networks capacity analysis -- generic activations","authors":"Mihailo Stojnic","doi":"arxiv-2402.05696","DOIUrl":null,"url":null,"abstract":"We consider the capacity of \\emph{treelike committee machines} (TCM) neural\nnetworks. Relying on Random Duality Theory (RDT), \\cite{Stojnictcmspnncaprdt23}\nrecently introduced a generic framework for their capacity analysis. An upgrade\nbased on the so-called \\emph{partially lifted} RDT (pl RDT) was then presented\nin \\cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the\nnetworks with the most typical, \\emph{sign}, activations. Here, on the other\nhand, we focus on networks with other, more general, types of activations and\nshow that the frameworks of\n\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently\npowerful to enable handling of such scenarios as well. In addition to the\nstandard \\emph{linear} activations, we uncover that particularly convenient\nresults can be obtained for two very commonly used activations, namely, the\n\\emph{quadratic} and \\emph{rectified linear unit (ReLU)} ones. In more concrete\nterms, for each of these activations, we obtain both the RDT and pl RDT based\nmemory capacities upper bound characterization for \\emph{any} given (even)\nnumber of the hidden layer neurons, $d$. In the process, we also uncover the\nfollowing two, rather remarkable, facts: 1) contrary to the common wisdom, both\nsets of results show that the bounding capacity decreases for large $d$ (the\nwidth of the hidden layer) while converging to a constant value; and 2) the\nmaximum bounding capacity is achieved for the networks with precisely\n\\textbf{\\emph{two}} hidden layer neurons! Moreover, the large $d$ converging\nvalues are observed to be in excellent agrement with the statistical physics\nreplica theory based predictions.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.05696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We consider the capacity of \emph{treelike committee machines} (TCM) neural networks. Relying on Random Duality Theory (RDT), \cite{Stojnictcmspnncaprdt23} recently introduced a generic framework for their capacity analysis. An upgrade based on the so-called \emph{partially lifted} RDT (pl RDT) was then presented in \cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the networks with the most typical, \emph{sign}, activations. Here, on the other hand, we focus on networks with other, more general, types of activations and show that the frameworks of \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently powerful to enable handling of such scenarios as well. In addition to the standard \emph{linear} activations, we uncover that particularly convenient results can be obtained for two very commonly used activations, namely, the \emph{quadratic} and \emph{rectified linear unit (ReLU)} ones. In more concrete terms, for each of these activations, we obtain both the RDT and pl RDT based memory capacities upper bound characterization for \emph{any} given (even) number of the hidden layer neurons, $d$. In the process, we also uncover the following two, rather remarkable, facts: 1) contrary to the common wisdom, both sets of results show that the bounding capacity decreases for large $d$ (the width of the hidden layer) while converging to a constant value; and 2) the maximum bounding capacity is achieved for the networks with precisely \textbf{\emph{two}} hidden layer neurons! Moreover, the large $d$ converging values are observed to be in excellent agrement with the statistical physics replica theory based predictions.

查看原文本刊更多论文

固定宽度树状神经网络容量分析 -- 通用激活

我们考虑的是\emph{treelike committee machines}（TCM）神经网络的容量。基于随机对偶理论（Random Duality Theory，RDT），我们最近引入了一个通用框架来分析它们的容量。基于所谓的 "部分提升 "的升级版随后，Stojnictcmspnncapliftedrdt23}介绍了基于所谓的 "部分提升 "RDT（pl RDT）的升级版。这两项研究的重点都是当时具有最典型的激活（emph{sign}）的网络。而在这里，我们将重点放在具有其他更普遍的激活类型的网络上，并展示了{Stojnictcmspncaprdt23,Stojnictcmspncapliftedrdt23}的框架有足够的能力来处理这些情况。除了标准的\emph{线性}激活外，我们还发现两个非常常用的激活，即\emph{二次方}和\emph{校正线性单元（ReLU）}激活，可以得到特别方便的结果。更具体地说，对于每一种激活，我们都能得到基于 RDT 和 pl RDT 的内存容量上界特性，用于给定（偶数）隐层神经元数量 $d$的（emph{any}）。在此过程中，我们还发现了以下两个相当显著的事实：1）与常识相反，这两组结果都表明，当 $d$（隐层宽度）较大时，边界容量会减小，同时收敛到一个恒定值；2）最大边界容量是在具有精确的（textbf{\emph{two}）隐层神经元的网络中实现的！此外，观察到的大$d$收敛值与基于统计物理复制理论的预测非常吻合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Information Theory

自引率

0.00%

发文量