希比后裔：对数似然学习的统一观点

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2024-08-19 DOI:10.1162/neco_a_01684

Jan Melchior;Robin Schiewer;Laurenz Wiskott

{"title":"希比后裔：对数似然学习的统一观点","authors":"Jan Melchior;Robin Schiewer;Laurenz Wiskott","doi":"10.1162/neco_a_01684","DOIUrl":null,"url":null,"abstract":"This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1669-1712"},"PeriodicalIF":2.1000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hebbian Descent: A Unified View on Log-Likelihood Learning\",\"authors\":\"Jan Melchior;Robin Schiewer;Laurenz Wiskott\",\"doi\":\"10.1162/neco_a_01684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"36 9\",\"pages\":\"1669-1712\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10661272/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661272/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

本研究讨论了人工神经网络输出层激活函数导数的负面影响，尤其是在持续学习中。我们提出了希比安下降作为克服这一限制的理论框架，并通过梯度下降的替代损失函数来实现，我们称之为希比安下降损失。这种损失实际上是广义的对数概率损失，对应于输出层的另一种权重更新规则，其中忽略了激活函数的导数。我们展示了这种更新是如何避免反向传播过程中在激活函数饱和区域出现误差信号消失的，这对于训练浅层神经网络和深层神经网络特别有帮助，因为这些网络只在输出层使用饱和激活函数。结合中心化，希比安下降可以带来更好的持续学习能力。我们将从一个统一的视角来探讨希比安学习、梯度下降和广义线性模型的优缺点。对于具有严格正导数的激活函数（实践中经常出现这种情况），希伯来梯度下降继承了常规梯度下降的收敛特性。虽然损失和输出层激活函数的既定配对（如均方误差与线性或交叉熵与 sigmoid/softmax）已被希比安下降法所涵盖，但我们为设计任意损失激活函数组合提供了一般见解，这些组合都能从希比安下降法中获益。对于浅层网络，我们证明了希比安下降法优于希比安学习法，其性能与常规梯度下降法相似，并且在持续学习中的性能远远优于所有其他测试过的更新规则。结合中心化，希比下降实现了一种遗忘机制，其防止灾难性干扰的效果明显优于其他测试过的更新规则。在训练深度神经网络时，我们的实验结果表明，希比安下降法的性能优于或类似于梯度下降法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hebbian Descent: A Unified View on Log-Likelihood Learning

This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.