Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

Mihailo Stojnic
{"title":"Exact capacity of the \\emph{wide} hidden layer treelike neural networks with generic activations","authors":"Mihailo Stojnic","doi":"arxiv-2402.05719","DOIUrl":null,"url":null,"abstract":"Recent progress in studying \\emph{treelike committee machines} (TCM) neural\nnetworks (NN) in\n\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23}\nshowed that the Random Duality Theory (RDT) and its a \\emph{partially\nlifted}(pl RDT) variant are powerful tools that can be used for very precise\nnetworks capacity analysis. Here, we consider \\emph{wide} hidden layer networks\nand uncover that certain aspects of numerical difficulties faced in\n\\cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we\nemploy recently developed \\emph{fully lifted} (fl) RDT to characterize the\n\\emph{wide} ($d\\rightarrow \\infty$) TCM nets capacity. We obtain explicit,\nclosed form, capacity characterizations for a very generic class of the hidden\nlayer activations. While the utilized approach significantly lowers the amount\nof the needed numerical evaluations, the ultimate fl RDT usefulness and success\nstill require a solid portion of the residual numerical work. To get the\nconcrete capacity values, we take four very famous activations examples:\n\\emph{\\textbf{ReLU}}, \\textbf{\\emph{quadratic}}, \\textbf{\\emph{erf}}, and\n\\textbf{\\emph{tanh}}. After successfully conducting all the residual numerical\nwork for all of them, we uncover that the whole lifting mechanism exhibits a\nremarkably rapid convergence with the relative improvements no better than\n$\\sim 0.1\\%$ happening already on the 3-rd level of lifting. As a convenient\nbonus, we also uncover that the capacity characterizations obtained on the\nfirst and second level of lifting precisely match those obtained through the\nstatistical physics replica theory methods in \\cite{ZavPeh21} for the generic\nand in \\cite{BalMalZech19} for the ReLU activations.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.05719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent progress in studying \emph{treelike committee machines} (TCM) neural networks (NN) in \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23} showed that the Random Duality Theory (RDT) and its a \emph{partially lifted}(pl RDT) variant are powerful tools that can be used for very precise networks capacity analysis. Here, we consider \emph{wide} hidden layer networks and uncover that certain aspects of numerical difficulties faced in \cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we employ recently developed \emph{fully lifted} (fl) RDT to characterize the \emph{wide} ($d\rightarrow \infty$) TCM nets capacity. We obtain explicit, closed form, capacity characterizations for a very generic class of the hidden layer activations. While the utilized approach significantly lowers the amount of the needed numerical evaluations, the ultimate fl RDT usefulness and success still require a solid portion of the residual numerical work. To get the concrete capacity values, we take four very famous activations examples: \emph{\textbf{ReLU}}, \textbf{\emph{quadratic}}, \textbf{\emph{erf}}, and \textbf{\emph{tanh}}. After successfully conducting all the residual numerical work for all of them, we uncover that the whole lifting mechanism exhibits a remarkably rapid convergence with the relative improvements no better than $\sim 0.1\%$ happening already on the 3-rd level of lifting. As a convenient bonus, we also uncover that the capacity characterizations obtained on the first and second level of lifting precisely match those obtained through the statistical physics replica theory methods in \cite{ZavPeh21} for the generic and in \cite{BalMalZech19} for the ReLU activations.
具有通用激活的 \emph{wide} 隐藏层树状神经网络的精确容量
最近,Stojnictcmspnncaprdt23、Stojnictcmspnncapliftedrdt23、Stojnictcmspnncapdiffactrdt23} 中表明,随机对偶理论(RDT)及其变体(pl RDT)是可以用于非常精确的网络容量分析的强大工具。在这里,我们考虑到了\emph{wide}隐藏层网络,并发现在\cite{Stojnictcmspncapdiffactrdt23}中面临的某些方面的数值困难奇迹般地消失了。特别是,我们利用最近开发的全提升(fl)RDT来描述($d\rightarrow \infty$)中医网络的容量。我们获得了一类非常通用的隐藏层激活的显式、闭式容量特征。虽然所使用的方法大大降低了所需的数值评估量,但要最终实现 RDT 的实用性和成功,仍然需要大量的剩余数值工作。为了得到具体的容量值,我们举了四个非常著名的激活例子:emph{textbf{ReLU}}、textbf{emph{quadratic}}、textbf{emph{erf}}和textbf{emph{tanh}}。在成功地对它们进行了所有的残差数值计算后,我们发现整个提升机制表现出了明显的快速收敛性,在第 3 层提升时的相对改进不超过 0.1%。作为一个方便的奖励,我们还发现在第一级和第二级提升中获得的容量特征与通过统计物理复制理论方法获得的容量特征精确吻合,这些方法是在(cite{ZavPeh21}中针对通用的和在(cite{BalMalZech19}中针对ReLU激活的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信