{"title":"Reformulation of RBM to Unify Linear and Nonlinear Dimensionality Reduction.","authors":"Jiangsheng You, Chun-Yen Liu","doi":"10.1162/neco_a_01751","DOIUrl":"https://doi.org/10.1162/neco_a_01751","url":null,"abstract":"<p><p>A restricted Boltzmann machine (RBM) is a two-layer neural network with shared weights and has been extensively studied for dimensionality reduction, data representation, and recommendation systems in the literature. The traditional RBM requires a probabilistic interpretation of the values on both layers and a Markov chain Monte Carlo (MCMC) procedure to generate samples during the training. The contrastive divergence (CD) is efficient to train the RBM, but its convergence has not been proved mathematically. In this letter, we investigate the RBM by using a maximum a posteriori (MAP) estimate and the expectation-maximization (EM) algorithm. We show that the CD algorithm without MCMC is convergent for the conditional likelihood object function. Another key contribution in this letter is the reformulation of the RBM into a deterministic model. Within the reformulated RBM, the CD algorithm without MCMC approximates the gradient descent (GD) method. This reformulated RBM can take the continuous scalar and vector variables on the nodes with flexibility in choosing the activation functions. Numerical experiments show its capability in both linear and nonlinear dimensionality reduction, and for the nonlinear dimensionality reduction, the reformulated RBM can outperform principal component analysis (PCA) by choosing the proper activation functions. Finally, we demonstrate its application to vector-valued nodes for the CIFAR-10 data set (color images) and the multivariate sequence data, which cannot be configured naturally with the traditional RBM. This work not only provides theoretical insights regarding the traditional RBM but also unifies the linear and nonlinear dimensionality reduction for scalar and vector variables.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-22"},"PeriodicalIF":2.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cecilia Romaro, Jose Roberto Castilho Piqueira, A C Roque
{"title":"Adding Space to Random Networks of Spiking Neurons: A Method Based on Scaling the Network Size.","authors":"Cecilia Romaro, Jose Roberto Castilho Piqueira, A C Roque","doi":"10.1162/neco_a_01747","DOIUrl":"https://doi.org/10.1162/neco_a_01747","url":null,"abstract":"<p><p>Many spiking neural network models are based on random graphs that do not include topological and structural properties featured in real brain networks. To turn these models into spatial networks that describe the topographic arrangement of connections is a challenging task because one has to deal with neurons at the spatial network boundary. Addition of space may generate spurious network behavior like oscillations introduced by periodic boundary conditions or unbalanced neuronal spiking due to lack or excess of connections. Here, we introduce a boundary solution method for networks with added spatial extension that prevents the occurrence of spurious spiking behavior. The method is based on a recently proposed technique for scaling the network size that preserves first- and second-order statistics.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-30"},"PeriodicalIF":2.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Elucidating the Theoretical Underpinnings of Surrogate Gradient Learning in Spiking Neural Networks.","authors":"Julia Gygax, Friedemann Zenke","doi":"10.1162/neco_a_01752","DOIUrl":"https://doi.org/10.1162/neco_a_01752","url":null,"abstract":"<p><p>Training spiking neural networks to approximate universal functions is essential for studying information processing in the brain and for neuromorphic computing. Yet the binary nature of spikes poses a challenge for direct gradient-based training. Surrogate gradients have been empirically successful in circumventing this problem, but their theoretical foundation remains elusive. Here, we investigate the relation of surrogate gradients to two theoretically well-founded approaches. On the one hand, we consider smoothed probabilistic models, which, due to the lack of support for automatic differentiation, are impractical for training multilayer spiking neural networks but provide derivatives equivalent to surrogate gradients for single neurons. On the other hand, we investigate stochastic automatic differentiation, which is compatible with discrete randomness but has not yet been used to train spiking neural networks. We find that the latter gives surrogate gradients a theoretical basis in stochastic spiking neural networks, where the surrogate derivative matches the derivative of the neuronal escape noise function. This finding supports the effectiveness of surrogate gradients in practice and suggests their suitability for stochastic spiking neural networks. However, surrogate gradients are generally not gradients of a surrogate loss despite their relation to stochastic automatic differentiation. Nevertheless, we empirically confirm the effectiveness of surrogate gradients in stochastic multilayer spiking neural networks and discuss their relation to deterministic networks as a special case. Our work gives theoretical support to surrogate gradients and the choice of a suitable surrogate derivative in stochastic spiking neural networks.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-40"},"PeriodicalIF":2.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masato Sugino, Mai Tanaka, Kenta Shimba, Kiyoshi Kotani, Yasuhiko Jimbo
{"title":"Distributed Synaptic Connection Strength Changes Dynamics in a Population Firing Rate Model in Response to Continuous External Stimuli.","authors":"Masato Sugino, Mai Tanaka, Kenta Shimba, Kiyoshi Kotani, Yasuhiko Jimbo","doi":"10.1162/neco_a_01749","DOIUrl":"https://doi.org/10.1162/neco_a_01749","url":null,"abstract":"<p><p>Neural network complexity allows for diverse neuronal population dynamics and realizes higherorder brain functions such as cognition and memory. Complexity is enhanced through chemical synapses with exponentially decaying conductance and greater variation in the neuronal connection strength due to synaptic plasticity. However, in the macroscopic neuronal population model, synaptic connections are often described by spike connections, and connection strengths within the population are assumed to be uniform. Thus, the effects of synaptic connections variation on network synchronization remain unclear. Based on recent advances in mean field theory for the quadratic integrate-and-fire neuronal network model, we introduce synaptic conductance and variation of connection strength into the excitatory and inhibitory neuronal population model and derive the macroscopic firing rate equations for faithful modeling. We then introduce a heuristic switching rule of the dynamic system with respect to the mean membrane potentials to avoid divergences in the computation caused by variations in the neuronal connection strength. We show that the switching rule agrees with the numerical computation of the microscopic level model. In the derived model, variations in synaptic conductance and connection strength strongly alter the stability of the solutions to the equations, which is related to the mechanism of synchronous firing. When we apply physiologically plausible values from layer 4 of the mammalian primary visual cortex to the derived model, we observe event-related desynchronization at the alpha and beta frequencies and event-related synchronization at the gamma frequency over a wide range of balanced external currents. Our results show that the introduction of complex synaptic connections and physiologically valid numerical values into the low-dimensional mean field equations reproduces dynamic changes such as eventrelated (de)synchronization, and provides a unique mathematical insight into the relationship between synaptic strength variation and oscillatory mechanism.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-23"},"PeriodicalIF":2.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilevel Data Representation for Training Deep Helmholtz Machines.","authors":"Jose Miguel Ramos, Luis Sa-Couto, Andreas Wichert","doi":"10.1162/neco_a_01748","DOIUrl":"https://doi.org/10.1162/neco_a_01748","url":null,"abstract":"<p><p>A vast majority of the current research in the field of machine learning is done using algorithms with strong arguments pointing to their biological implausibility such as backpropagation, deviating the field's focus from understanding its original organic inspiration to a compulsive search for optimal performance. Yet there have been a few proposed models that respect most of the biological constraints present in the human brain and are valid candidates for mimicking some of its properties and mechanisms. In this letter, we focus on guiding the learning of a biologically plausible generative model called the Helmholtz machine in complex search spaces using a heuristic based on the human image perception mechanism. We hypothesize that this model's learning algorithm is not fit for deep networks due to its Hebbian-like local update rule, rendering it incapable of taking full advantage of the compositional properties that multilayer networks provide. We propose to overcome this problem by providing the network's hidden layers with visual queues at different resolutions using multilevel data representation. The results on several image data sets showed that the model was able to not only obtain better overall quality but also a wider diversity in the generated images, corroborating our intuition that using our proposed heuristic allows the model to take more advantage of the network's depth growth. More important, they show the unexplored possibilities underlying brain-inspired models and techniques.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-24"},"PeriodicalIF":2.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shivaram Mani, Paul Hurley, André van Schaik, Travis Monk
{"title":"The Leaky Integrate-and-Fire Neuron Is a Change-Point Detector for Compound Poisson Processes.","authors":"Shivaram Mani, Paul Hurley, André van Schaik, Travis Monk","doi":"10.1162/neco_a_01750","DOIUrl":"https://doi.org/10.1162/neco_a_01750","url":null,"abstract":"<p><p>Animal nervous systems can detect changes in their environments within hundredths of a second. They do so by discerning abrupt shifts in sensory neural activity. Many neuroscience studies have employed change-point detection (CPD) algorithms to estimate such abrupt shifts in neural activity. But very few studies have suggested that spiking neurons themselves are online change-point detectors. We show that a leaky integrate-and-fire (LIF) neuron implements an online CPD algorithm for a compound Poisson process. We quantify the CPD performance of an LIF neuron under various regions of its parameter space. We show that CPD can be a recursive algorithm where the output of one algorithm can be input to another. Then we show that a simple feedforward network of LIF neurons can quickly and reliably detect very small changes in input spiking rates. For example, our network detects a 5% change in input rates within 20 ms on average, and false-positive detections are extremely rare. In a rigorous statistical context, we interpret the salient features of the LIF neuron: its membrane potential, synaptic weight, time constant, resting potential, action potentials, and threshold. Our results potentially generalize beyond the LIF neuron model and its associated CPD problem. If spiking neurons perform change-point detection on their inputs, then the electrophysiological properties of their membranes must be related to the spiking statistics of their inputs. We demonstrate one example of this relationship for the LIF neuron and compound Poisson processes and suggest how to test this hypothesis more broadly. Maybe neurons are not noisy devices whose action potentials must be averaged over time or populations. Instead, neurons might implement sophisticated, optimal, and online statistical algorithms on their inputs.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-31"},"PeriodicalIF":2.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge as a Breaking of Ergodicity","authors":"Yang He;Vassiliy Lubchenko","doi":"10.1162/neco_a_01741","DOIUrl":"10.1162/neco_a_01741","url":null,"abstract":"We construct a thermodynamic potential that can guide training of a generative model defined on a set of binary degrees of freedom. We argue that upon reduction in description, so as to make the generative model computationally manageable, the potential develops multiple minima. This is mirrored by the emergence of multiple minima in the free energy proper of the generative model itself. The variety of training samples that employ N binary degrees of freedom is ordinarily much lower than the size 2N of the full phase space. The nonrepresented configurations, we argue, should be thought of as comprising a high-temperature phase separated by an extensive energy gap from the configurations composing the training set. Thus, training amounts to sampling a free energy surface in the form of a library of distinct bound states, each of which breaks ergodicity. The ergodicity breaking prevents escape into the near continuum of states comprising the high-temperature phase; thus, it is necessary for proper functionality. It may, however, have the side effect of limiting access to patterns that were underrepresented in the training set. At the same time, the ergodicity breaking within the library complicates both learning and retrieval. As a remedy, one may concurrently employ multiple generative models—up to one model per free energy minimum.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 4","pages":"742-792"},"PeriodicalIF":2.7,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karl J. Friston;Tommaso Salvatori;Takuya Isomura;Alexander Tschantz;Alex Kiefer;Tim Verbelen;Magnus Koudahl;Aswin Paul;Thomas Parr;Adeel Razi;Brett J. Kagan;Christopher L. Buckley;Maxwell J. D. Ramstead
{"title":"Active Inference and Intentional Behavior","authors":"Karl J. Friston;Tommaso Salvatori;Takuya Isomura;Alexander Tschantz;Alex Kiefer;Tim Verbelen;Magnus Koudahl;Aswin Paul;Thomas Parr;Adeel Razi;Brett J. Kagan;Christopher L. Buckley;Maxwell J. D. Ramstead","doi":"10.1162/neco_a_01738","DOIUrl":"10.1162/neco_a_01738","url":null,"abstract":"Recent advances in theoretical biology suggest that key definitions of basal cognition and sentient behavior may arise as emergent properties of in vitro cell cultures and neuronal networks. Such neuronal networks reorganize activity to demonstrate structured behaviors when embodied in structured information landscapes. In this article, we characterize this kind of self-organization through the lens of the free energy principle, that is, as self-evidencing. We do this by first discussing the definitions of reactive and sentient behavior in the setting of active inference, which describes the behavior of agents that model the consequences of their actions. We then introduce a formal account of intentional behavior that describes agents as driven by a preferred end point or goal in latent state-spaces. We then investigate these forms of (reactive, sentient, and intentional) behavior using simulations. First, we simulate the in vitro experiments, in which neuronal cultures modulated activity to improve gameplay in a simplified version of Pong by implementing nested, free energy minimizing processes. The simulations are then used to deconstruct the ensuing predictive behavior, leading to the distinction between merely reactive, sentient, and intentional behavior with the latter formalized in terms of inductive inference. This distinction is further studied using simple machine learning benchmarks (navigation in a grid world and the Tower of Hanoi problem) that show how quickly and efficiently adaptive behavior emerges under an inductive form of active inference.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 4","pages":"666-700"},"PeriodicalIF":2.7,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raffaele Marino;Lorenzo Buffoni;Lorenzo Chicchi;Francesca Di Patti;Diego Febbe;Lorenzo Giambagli;Duccio Fanelli
{"title":"Learning in Wilson-Cowan Model for Metapopulation","authors":"Raffaele Marino;Lorenzo Buffoni;Lorenzo Chicchi;Francesca Di Patti;Diego Febbe;Lorenzo Giambagli;Duccio Fanelli","doi":"10.1162/neco_a_01744","DOIUrl":"10.1162/neco_a_01744","url":null,"abstract":"The Wilson-Cowan model for metapopulation, a neural mass network model, treats different subcortical regions of the brain as connected nodes, with connections representing various types of structural, functional, or effective neuronal connectivity between these regions. Each region comprises interacting populations of excitatory and inhibitory cells, consistent with the standard Wilson-Cowan model. In this article, we show how to incorporate stable attractors into such a metapopulation model’s dynamics. By doing so, we transform the neural mass network model into a biologically inspired learning algorithm capable of solving different classification tasks. We test it on MNIST and Fashion MNIST in combination with convolutional neural networks, as well as on CIFAR-10 and TF-FLOWERS, and in combination with a transformer architecture (BERT) on IMDB, consistently achieving high classification accuracy.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 4","pages":"701-741"},"PeriodicalIF":2.7,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nearly Optimal Learning Using Sparse Deep ReLU Networks in Regularized Empirical Risk Minimization With Lipschitz Loss","authors":"Ke Huang;Mingming Liu;Shujie Ma","doi":"10.1162/neco_a_01742","DOIUrl":"10.1162/neco_a_01742","url":null,"abstract":"We propose a sparse deep ReLU network (SDRN) estimator of the regression function obtained from regularized empirical risk minimization with a Lipschitz loss function. Our framework can be applied to a variety of regression and classification problems. We establish novel nonasymptotic excess risk bounds for our SDRN estimator when the regression function belongs to a Sobolev space with mixed derivatives. We obtain a new, nearly optimal, risk rate in the sense that the SDRN estimator can achieve nearly the same optimal minimax convergence rate as one-dimensional nonparametric regression with the dimension involved in a logarithm term only when the feature dimension is fixed. The estimator has a slightly slower rate when the dimension grows with the sample size. We show that the depth of the SDRN estimator grows with the sample size in logarithmic order, and the total number of nodes and weights grows in polynomial order of the sample size to have the nearly optimal risk rate. The proposed SDRN can go deeper with fewer parameters to well estimate the regression and overcome the overfitting problem encountered by conventional feedforward neural networks.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 4","pages":"815-870"},"PeriodicalIF":2.7,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}