{"title":"Efficient Hyperdimensional Computing With Spiking Phasors","authors":"Jeff Orchard;P. Michael Furlong;Kathryn Simone","doi":"10.1162/neco_a_01693","DOIUrl":"10.1162/neco_a_01693","url":null,"abstract":"Hyperdimensional (HD) computing (also referred to as vector symbolic architectures, VSAs) offers a method for encoding symbols into vectors, allowing for those symbols to be combined in different ways to form other vectors in the same vector space. The vectors and operators form a compositional algebra, such that composite vectors can be decomposed back to their constituent vectors. Many useful algorithms have implementations in HD computing, such as classification, spatial navigation, language modeling, and logic. In this letter, we propose a spiking implementation of Fourier holographic reduced representation (FHRR), one of the most versatile VSAs. The phase of each complex number of an FHRR vector is encoded as a spike time within a cycle. Neuron models derived from these spiking phasors can perform the requisite vector operations to implement an FHRR. We demonstrate the power and versatility of our spiking networks in a number of foundational problem domains, including symbol binding and unbinding, spatial representation, function representation, function integration, and memory (i.e., signal delay).","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1886-1911"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Magris;Mostafa Shabani;Alexandros Iosifidis
{"title":"Manifold Gaussian Variational Bayes on the Precision Matrix","authors":"Martin Magris;Mostafa Shabani;Alexandros Iosifidis","doi":"10.1162/neco_a_01686","DOIUrl":"10.1162/neco_a_01686","url":null,"abstract":"We propose an optimization algorithm for variational inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for gaussian variational inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Our manifold gaussian variational Bayes on the precision matrix (MGVBP) solution provides simple update rules, is straightforward to implement, and the use of the precision matrix parameterization has a significant computational advantage. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models. Over five data sets, we empirically validate our feasible approach on different statistical and econometric models, discussing its performance with respect to baseline methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1744-1798"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiming Jiang;Jinlan Liu;Dongpo Xu;Danilo P. Mandic
{"title":"UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization","authors":"Yiming Jiang;Jinlan Liu;Dongpo Xu;Danilo P. Mandic","doi":"10.1162/neco_a_01692","DOIUrl":"10.1162/neco_a_01692","url":null,"abstract":"Adam-type algorithms have become a preferred choice for optimization in the deep learning setting; however, despite their success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms, termed UAdam. It is equipped with a general form of the second-order moment, which makes it possible to include Adam and its existing and future variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. The approach is supported by a rigorous convergence analysis of UAdam in the general nonconvex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with a rate of O(1/T). Furthermore, the size of the neighborhood decreases as the parameter β1 increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also reveal the convergence conditions of vanilla Adam, together with the selection of appropriate hyperparameters. This provides a theoretical guarantee for the analysis, applications, and further developments of the whole general class of Adam-type algorithms. Finally, several numerical experiments are provided to support our theoretical findings.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1912-1938"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hebbian Descent: A Unified View on Log-Likelihood Learning","authors":"Jan Melchior;Robin Schiewer;Laurenz Wiskott","doi":"10.1162/neco_a_01684","DOIUrl":"10.1162/neco_a_01684","url":null,"abstract":"This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1669-1712"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle","authors":"Theodore Jerome Tinker;Kenji Doya;Jun Tani","doi":"10.1162/neco_a_01690","DOIUrl":"10.1162/neco_a_01690","url":null,"abstract":"In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1854-1885"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minkyu Choi;Yizhen Zhang;Kuan Han;Xiaokai Wang;Zhongming Liu
{"title":"Human Eyes–Inspired Recurrent Neural Networks Are More Robust Against Adversarial Noises","authors":"Minkyu Choi;Yizhen Zhang;Kuan Han;Xiaokai Wang;Zhongming Liu","doi":"10.1162/neco_a_01688","DOIUrl":"10.1162/neco_a_01688","url":null,"abstract":"Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1713-1743"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Della Daiyi Luo;Bapun Giri;Kamran Diba;Caleb Kemere
{"title":"Extended Poisson Gaussian-Process Latent Variable Model for Unsupervised Neural Decoding","authors":"Della Daiyi Luo;Bapun Giri;Kamran Diba;Caleb Kemere","doi":"10.1162/neco_a_01685","DOIUrl":"10.1162/neco_a_01685","url":null,"abstract":"Dimension reduction on neural activity paves a way for unsupervised neural decoding by dissociating the measurement of internal neural pattern reactivation from the measurement of external variable tuning. With assumptions only on the smoothness of latent dynamics and of internal tuning curves, the Poisson gaussian-process latent variable model (P-GPLVM; Wu et al., 2017) is a powerful tool to discover the low-dimensional latent structure for high-dimensional spike trains. However, when given novel neural data, the original model lacks a method to infer their latent trajectories in the learned latent space, limiting its ability for estimating the neural reactivation. Here, we extend the P-GPLVM to enable the latent variable inference of new data constrained by previously learned smoothness and mapping information. We also describe a principled approach for the constrained latent variable inference for temporally compressed patterns of activity, such as those found in population burst events during hippocampal sharp-wave ripples, as well as metrics for assessing the validity of neural pattern reactivation and inferring the encoded experience. Applying these approaches to hippocampal ensemble recordings during active maze exploration, we replicate the result that P-GPLVM learns a latent space encoding the animal’s position. We further demonstrate that this latent space can differentiate one maze context from another. By inferring the latent variables of new neural data during running, certain neural patterns are observed to reactivate, in accordance with the similarity of experiences encoded by its nearby neural trajectories in the training data manifold. Finally, reactivation of neural patterns can be estimated for neural activity during population burst events as well, allowing the identification for replay events of versatile behaviors and more general experiences. Thus, our extension of the P-GPLVM framework for unsupervised analysis of neural activity can be used to answer critical questions related to scientific discovery.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1449-1475"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy Complexity of Convolutional Neural Networks","authors":"Jiří Šíma;Petra Vidnerová;Vojtěch Mrázek","doi":"10.1162/neco_a_01676","DOIUrl":"10.1162/neco_a_01676","url":null,"abstract":"The energy efficiency of hardware implementations of convolutional neural networks (CNNs) is critical to their widespread deployment in low-power mobile devices. Recently, a number of methods have been proposed for providing energy-optimal mappings of CNNs onto diverse hardware accelerators. Their estimated energy consumption is related to specific implementation details and hardware parameters, which does not allow for machine-independent exploration of CNN energy measures. In this letter, we introduce a simplified theoretical energy complexity model for CNNs, based on only a two-level memory hierarchy that captures asymptotically all important sources of energy consumption for different CNN hardware implementations. In this model, we derive a simple energy lower bound and calculate the energy complexity of evaluating a CNN layer for two common data flows, providing corresponding upper bounds. According to statistical tests, the theoretical energy upper and lower bounds we present fit asymptotically very well with the real energy consumption of CNN implementations on the Simba and Eyeriss hardware platforms, estimated by the Timeloop/Accelergy program, which validates the proposed energy complexity model for CNNs.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1601-1625"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trade-Offs Between Energy and Depth of Neural Networks","authors":"Kei Uchizawa;Haruki Abe","doi":"10.1162/neco_a_01683","DOIUrl":"10.1162/neco_a_01683","url":null,"abstract":"We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1541-1567"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision Benchmark Data Sets","authors":"Maria Osório;Andreas Wichert","doi":"10.1162/neco_a_01677","DOIUrl":"10.1162/neco_a_01677","url":null,"abstract":"In computer vision research, convolutional neural networks (CNNs) have demonstrated remarkable capabilities at extracting patterns from raw pixel data, achieving state-of-the-art recognition accuracy. However, they significantly differ from human visual perception, prioritizing pixel-level correlations and statistical patterns, often overlooking object semantics. To explore this difference, we propose an approach that isolates core visual features crucial for human perception and object recognition: color, texture, and shape. In experiments on three benchmarks—Fruits 360, CIFAR-10, and Fashion MNIST—each visual feature is individually input into a neural network. Results reveal data set–dependent variations in classification accuracy, highlighting that deep learning models tend to learn pixel-level correlations instead of fundamental visual features. To validate this observation, we used various combinations of concatenated visual features as input for a neural network on the CIFAR-10 data set. CNNs excel at learning statistical patterns in images, achieving exceptional performance when training and test data share similar distributions. To substantiate this point, we trained a CNN on CIFAR-10 data set and evaluated its performance on the “dog” class from CIFAR-10 and on an equivalent number of examples from the Stanford Dogs data set. The CNN poor performance on Stanford Dogs images underlines the disparity between deep learning and human visual perception, highlighting the need for models that learn object semantics. Specialized benchmark data sets with controlled variations hold promise for aligning learned representations with human cognition in computer vision research.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1626-1642"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}