A theory of synaptic neural balance: From local to global order

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2025-05-16 DOI:10.1016/j.artint.2025.104360

Pierre Baldi, Antonios Alexos, Ian Domingo, Alireza Rahmansetayesh

{"title":"A theory of synaptic neural balance: From local to global order","authors":"Pierre Baldi, Antonios Alexos, Ian Domingo, Alireza Rahmansetayesh","doi":"10.1016/j.artint.2025.104360","DOIUrl":null,"url":null,"abstract":"<div><div>We develop a general theory of synaptic neural balance and how it can emerge or be enforced in neural networks. For a given additive cost function <em>R</em> (regularizer), a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward networks of ReLU units trained with <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> regularizers, which exhibit balance after proper training. The theory explains this phenomenon and extends it in several directions. The first direction is the extension to bilinear and other activation functions. The second direction is the extension to more general regularizers, including all <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>p</mi></mrow></msub></math></span> (<span><math><mi>p</mi><mo>></mo><mn>0</mn></math></span>) regularizers. The third direction is the extension to non-layered architectures, recurrent architectures, convolutional architectures, as well as architectures with mixed activation functions and to different balancing algorithms. Gradient descent on the error function alone does not converge in general to a balanced state, where every neuron is in balance, even when starting from a balanced state. However, gradient descent on the regularized error function ought to converge to a balanced state, and thus network balance can be used to assess learning progress. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Finally, and most importantly, given any set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic balancing algorithm to the same unique set of balanced weights. The reason for this convergence is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. Simulations show that balancing neurons prior to learning, or during learning in alternation with gradient descent steps, can improve learning speed and performance thereby expanding the arsenal of available training tools. Scaling and balancing operations are entirely local and thus physically plausible in biological and neuromorphic neural networks.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"346 ","pages":"Article 104360"},"PeriodicalIF":4.6000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225000797","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We develop a general theory of synaptic neural balance and how it can emerge or be enforced in neural networks. For a given additive cost function R (regularizer), a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward networks of ReLU units trained with

L_{2}

regularizers, which exhibit balance after proper training. The theory explains this phenomenon and extends it in several directions. The first direction is the extension to bilinear and other activation functions. The second direction is the extension to more general regularizers, including all

L_{p}

(

p > 0

) regularizers. The third direction is the extension to non-layered architectures, recurrent architectures, convolutional architectures, as well as architectures with mixed activation functions and to different balancing algorithms. Gradient descent on the error function alone does not converge in general to a balanced state, where every neuron is in balance, even when starting from a balanced state. However, gradient descent on the regularized error function ought to converge to a balanced state, and thus network balance can be used to assess learning progress. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Finally, and most importantly, given any set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic balancing algorithm to the same unique set of balanced weights. The reason for this convergence is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. Simulations show that balancing neurons prior to learning, or during learning in alternation with gradient descent steps, can improve learning speed and performance thereby expanding the arsenal of available training tools. Scaling and balancing operations are entirely local and thus physically plausible in biological and neuromorphic neural networks.

查看原文本刊更多论文

突触神经平衡理论：从局部秩序到全局秩序

我们发展了突触神经平衡的一般理论，以及它如何在神经网络中出现或被强制执行。对于给定的加性代价函数R（正则化器），如果一个神经元的输入权值的总代价等于输出权值的总代价，则该神经元处于平衡状态。用L2正则化器训练的ReLU单元前馈网络提供了一个基本的例子，经过适当的训练后，ReLU单元表现出平衡。该理论解释了这一现象，并在几个方向上进行了扩展。第一个方向是对双线性和其他激活函数的扩展。第二个方向是扩展到更一般的正则子，包括所有Lp （p>0）正则子。第三个方向是扩展到非分层架构、循环架构、卷积架构、混合激活函数架构和不同的平衡算法。单独的误差函数的梯度下降通常不会收敛到平衡状态，在平衡状态下，每个神经元都处于平衡状态，即使从平衡状态开始。然而，正则化误差函数上的梯度下降应该收敛到平衡状态，因此网络平衡可以用来评估学习进度。该理论基于两种局部神经元操作：伸缩是可交换的，平衡是不可交换的。最后，也是最重要的是，给定任何一组权值，当以随机方式对每个神经元应用局部平衡操作时，随机平衡算法总是通过收敛到相同的唯一的平衡权值集而出现全局顺序。这种收敛的原因是存在一个潜在的严格凸优化问题，其中相关变量被约束为线性的，仅与体系结构相关的流形。仿真表明，在学习前或学习过程中，通过梯度下降步骤交替平衡神经元，可以提高学习速度和性能，从而扩展可用训练工具的武器库。缩放和平衡操作完全是局部的，因此在生物和神经形态神经网络中物理上是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.