{"title":"Replay as a Basis for Backpropagation Through Time in the Brain.","authors":"Huzi Cheng, Joshua W Brown","doi":"10.1162/neco_a_01735","DOIUrl":"10.1162/neco_a_01735","url":null,"abstract":"<p><p>How episodic memories are formed in the brain is a continuing puzzle for the neuroscience community. The brain areas that are critical for episodic learning (e.g., the hippocampus) are characterized by recurrent connectivity and generate frequent offline replay events. The function of the replay events is a subject of active debate. Recurrent connectivity, computational simulations show, enables sequence learning when combined with a suitable learning algorithm such as backpropagation through time (BPTT). BPTT, however, is not biologically plausible. We describe here, for the first time, a biologically plausible variant of BPTT in a reversible recurrent neural network, R2N2, that critically leverages offline replay to support episodic learning. The model uses forward and backward offline replay to transfer information between two recurrent neural networks, a cache and a consolidator, that perform rapid one-shot learning and statistical learning, respectively. Unlike replay in standard BPTT, this architecture requires no artificial external memory store. This approach outperforms existing solutions like random feedback local online learning and reservoir network. It also accounts for the functional significance of hippocampal replay events. We demonstrate the R2N2 network properties using benchmark tests from computer science and simulate the rodent delayed alternation T-maze task.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"403-436"},"PeriodicalIF":2.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gradual Domain Adaptation via Normalizing Flows.","authors":"Shogo Sagawa, Hideitsu Hino","doi":"10.1162/neco_a_01734","DOIUrl":"10.1162/neco_a_01734","url":null,"abstract":"<p><p>Standard domain adaptation methods do not work well when a large gap exists between the source and target domains. Gradual domain adaptation is one of the approaches used to address the problem. It involves leveraging the intermediate domain, which gradually shifts from the source domain to the target domain. In previous work, it is assumed that the number of intermediate domains is large and the distance between adjacent domains is small; hence, the gradual domain adaptation algorithm, involving self-training with unlabeled data sets, is applicable. In practice, however, gradual self-training will fail because the number of intermediate domains is limited and the distance between adjacent domains is large. We propose the use of normalizing flows to deal with this problem while maintaining the framework of unsupervised domain adaptation. The proposed method learns a transformation from the distribution of the target domains to the gaussian mixture distribution via the source domain. We evaluate our proposed method by experiments using real-world data sets and confirm that it mitigates the problem we have explained and improves the classification performance.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"522-568"},"PeriodicalIF":2.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncovering Dynamical Equations of Stochastic Decision Models Using Data-Driven SINDy Algorithm.","authors":"Brendan Lenfesty, Saugat Bhattacharyya, KongFatt Wong-Lin","doi":"10.1162/neco_a_01736","DOIUrl":"10.1162/neco_a_01736","url":null,"abstract":"<p><p>Decision formation in perceptual decision making involves sensory evidence accumulation instantiated by the temporal integration of an internal decision variable toward some decision criterion or threshold, as described by sequential sampling theoretical models. The decision variable can be represented in the form of experimentally observable neural activities. Hence, elucidating the appropriate theoretical model becomes crucial to understanding the mechanisms underlying perceptual decision formation. Existing computational methods are limited to either fitting of choice behavioral data or linear model estimation from neural activity data. In this work, we made use of sparse identification of nonlinear dynamics (SINDy), a data-driven approach, to elucidate the deterministic linear and nonlinear components of often-used stochastic decision models within reaction time task paradigms. Based on the simulated decision variable activities of the models and assuming the noise coefficient term is known beforehand, SINDy, enhanced with approaches using multiple trials, could readily estimate the deterministic terms in the dynamical equations, choice accuracy, and decision time of the models across a range of signal-to-noise ratio values. In particular, SINDy performed the best using the more memory-intensive multi-trial approach while trial-averaging of parameters performed more moderately. The single-trial approach, although expectedly not performing as well, may be useful for real-time modeling. Taken together, our work offers alternative approaches for SINDy to uncover the dynamics in perceptual decision making and, more generally, for first-passage time problems.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"569-587"},"PeriodicalIF":2.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Compressive Power of Autoencoders With Linear and ReLU Activation Functions","authors":"Liangjie Sun;Chenyao Wu;Wai-Ki Ching;Tatsuya Akutsu","doi":"10.1162/neco_a_01729","DOIUrl":"10.1162/neco_a_01729","url":null,"abstract":"In this article, we mainly study the depth and width of autoencoders consisting of rectified linear unit (ReLU) activation functions. An autoencoder is a layered neural network consisting of an encoder, which compresses an input vector to a lower-dimensional vector, and a decoder, which transforms the low-dimensional vector back to the original input vector exactly (or approximately). In a previous study, Melkman et al. (2023) studied the depth and width of autoencoders using linear threshold activation functions with binary input and output vectors. We show that similar theoretical results hold if autoencoders using ReLU activation functions with real input and output vectors are used. Furthermore, we show that it is possible to compress input vectors to one-dimensional vectors using ReLU activation functions, although the size of compressed vectors is trivially Ω(log n) for autoencoders with linear threshold activation functions, where n is the number of input vectors. We also study the cases of linear activation functions. The results suggest that the compressive power of autoencoders using linear activation functions is considerably limited compared with those using ReLU activation functions.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 2","pages":"235-259"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning in Associative Networks Through Pavlovian Dynamics","authors":"Daniele Lotito;Miriam Aquaro;Chiara Marullo","doi":"10.1162/neco_a_01730","DOIUrl":"10.1162/neco_a_01730","url":null,"abstract":"Hebbian learning theory is rooted in Pavlov’s classical conditioning While mathematical models of the former have been proposed and studied in the past decades, especially in spin glass theory, only recently has it been numerically shown that it is possible to write neural and synaptic dynamics that mirror Pavlov conditioning mechanisms and also give rise to synaptic weights that correspond to the Hebbian learning rule. In this article we show that the same dynamics can be derived with equilibrium statistical mechanics tools and basic and motivated modeling assumptions. Then we show how to study the resulting system of coupled stochastic differential equations assuming the reasonable separation of neural and synaptic timescale. In particular, we analytically demonstrate that this synaptic evolution converges to the Hebbian learning rule in various settings and compute the variance of the stochastic process. Finally, drawing from evidence on pure memory reinforcement during sleep stages, we show how the proposed model can simulate neural networks that undergo sleep-associated memory consolidation processes, thereby proving the compatibility of Pavlovian learning with dreaming mechanisms.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 2","pages":"311-343"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalization Guarantees of Gradient Descent for Shallow Neural Networks","authors":"Puyu Wang;Yunwen Lei;Di Wang;Yiming Ying;Ding-Xuan Zhou","doi":"10.1162/neco_a_01725","DOIUrl":"10.1162/neco_a_01725","url":null,"abstract":"Significant progress has been made recently in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling. Here, network scaling corresponds to the normalization of the layers. In this article, we greatly extend the previous work (Lei et al., 2022; Richards & Kuzborskij, 2021) by conducting a comprehensive stability and generalization analysis of GD for two-layer and three-layer NNs. For two-layer NNs, our results are established under general network scaling, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of overparameterization. As a direct application of our general findings, we derive the excess risk rate of O(1/n) for GD in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for underparameterized and overparameterized NNs trained by GD to attain the desired risk rate of O(1/n). Moreover, we demonstrate that as the scaling factor increases or the network complexity decreases, less overparameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of O(1/n) for GD in both two-layer and three-layer NNs.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 2","pages":"344-402"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bounded Rational Decision Networks With Belief Propagation","authors":"Gerrit Schmid;Sebastian Gottwald;Daniel A. Braun","doi":"10.1162/neco_a_01719","DOIUrl":"10.1162/neco_a_01719","url":null,"abstract":"Complex information processing systems that are capable of a wide variety of tasks, such as the human brain, are composed of specialized units that collaborate and communicate with each other. An important property of such information processing networks is locality: there is no single global unit controlling the modules, but information is exchanged locally. Here, we consider a decision-theoretic approach to study networks of bounded rational decision makers that are allowed to specialize and communicate with each other. In contrast to previous work that has focused on feedforward communication between decision-making agents, we consider cyclical information processing paths allowing for back-and-forth communication. We adapt message-passing algorithms to suit this purpose, essentially allowing for local information flow between units and thus enabling circular dependency structures. We provide examples that show how repeated communication can increase performance given that each unit’s information processing capability is limited and that decision-making systems with too few or too many connections and feedback loops achieve suboptimal utility.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"76-127"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10810330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Max Dabagia;Christos H. Papadimitriou;Santosh S. Vempala
{"title":"Computation With Sequences of Assemblies in a Model of the Brain","authors":"Max Dabagia;Christos H. Papadimitriou;Santosh S. Vempala","doi":"10.1162/neco_a_01720","DOIUrl":"10.1162/neco_a_01720","url":null,"abstract":"Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain’s learning capabilities remain unmatched. How cognition arises from neural activity is the central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou et al. (2020) and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that in the same model, sequential precedence can be captured naturally through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. If the stimulus sequence is presented to two brain areas simultaneously, a scaffolded representation is created, resulting in more efficient memorization and recall, in agreement with cognitive experiments. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences. Through an extension of this mechanism, the model can be shown to be capable of universal computation. Taken together, these results provide a concrete hypothesis for the basis of the brain’s remarkable abilities to compute and learn, with sequences playing a vital role.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"193-233"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher J. Kymn;Denis Kleyko;E. Paxon Frady;Connor Bybee;Pentti Kanerva;Friedrich T. Sommer;Bruno A. Olshausen
{"title":"Computing With Residue Numbers in High-Dimensional Representation","authors":"Christopher J. Kymn;Denis Kleyko;E. Paxon Frady;Connor Bybee;Pentti Kanerva;Friedrich T. Sommer;Bruno A. Olshausen","doi":"10.1162/neco_a_01723","DOIUrl":"10.1162/neco_a_01723","url":null,"abstract":"We introduce residue hyperdimensional computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors. We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements. The resulting framework, when combined with an efficient method for factorizing high-dimensional vectors, can represent and operate on numerical values over a large dynamic range using resources that scale only logarithmically with the range, a vast improvement over previous methods. It also exhibits impressive robustness to noise. We demonstrate the potential for this framework to solve computationally difficult problems in visual perception and combinatorial optimization, showing improvement over baseline methods. More broadly, the framework provides a possible account for the computational operations of grid cells in the brain, and it suggests new machine learning architectures for representing and manipulating numerical data.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"1-37"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomohiro Shiraishi;Daiki Miwa;Vo Nguyen Le Duy;Ichiro Takeuchi
{"title":"Selective Inference for Change Point Detection by Recurrent Neural Network","authors":"Tomohiro Shiraishi;Daiki Miwa;Vo Nguyen Le Duy;Ichiro Takeuchi","doi":"10.1162/neco_a_01724","DOIUrl":"10.1162/neco_a_01724","url":null,"abstract":"In this study, we investigate the quantification of the statistical reliability of detected change points (CPs) in time series using a recurrent neural network (RNN). Thanks to its flexibility, RNN holds the potential to effectively identify CPs in time series characterized by complex dynamics. However, there is an increased risk of erroneously detecting random noise fluctuations as CPs. The primary goal of this study is to rigorously control the risk of false detections by providing theoretically valid p-values to the CPs detected by RNN. To achieve this, we introduce a novel method based on the framework of selective inference (SI). SI enables valid inferences by conditioning on the event of hypothesis selection, thus mitigating bias from generating and testing hypotheses on the same data. In this study, we apply an SI framework to RNN-based CP detection, where characterizing the complex process of RNN selecting CPs is our main technical challenge. We demonstrate the validity and effectiveness of the proposed method through artificial and real data experiments.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"160-192"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}