{"title":"A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of the Multi-Armed Bandit.","authors":"Shintaro Nakamura, Masashi Sugiyama","doi":"10.1162/neco_a_01728","DOIUrl":"10.1162/neco_a_01728","url":null,"abstract":"<p><p>We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper-bound-matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world data sets.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"294-310"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Compressive Power of Autoencoders With Linear and ReLU Activation Functions.","authors":"Liangjie Sun, Chenyao Wu, Wai-Ki Ching, Tatsuya Akutsu","doi":"10.1162/neco_a_01729","DOIUrl":"10.1162/neco_a_01729","url":null,"abstract":"<p><p>In this article, we mainly study the depth and width of autoencoders consisting of rectified linear unit (ReLU) activation functions. An autoencoder is a layered neural network consisting of an encoder, which compresses an input vector to a lower-dimensional vector, and a decoder, which transforms the low-dimensional vector back to the original input vector exactly (or approximately). In a previous study, Melkman et al. (2023) studied the depth and width of autoencoders using linear threshold activation functions with binary input and output vectors. We show that similar theoretical results hold if autoencoders using ReLU activation functions with real input and output vectors are used. Furthermore, we show that it is possible to compress input vectors to one-dimensional vectors using ReLU activation functions, although the size of compressed vectors is trivially Ω(log n) for autoencoders with linear threshold activation functions, where n is the number of input vectors. We also study the cases of linear activation functions. The results suggest that the compressive power of autoencoders using linear activation functions is considerably limited compared with those using ReLU activation functions.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"235-259"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalization Analysis of Transformers in Distribution Regression.","authors":"Peilin Liu, Ding-Xuan Zhou","doi":"10.1162/neco_a_01726","DOIUrl":"10.1162/neco_a_01726","url":null,"abstract":"<p><p>In recent years, models based on the transformer architecture have seen widespread applications and have become one of the core tools in the field of deep learning. Numerous successful and efficient techniques, such as parameter-efficient fine-tuning and efficient scaling, have been proposed surrounding their applications to further enhance performance. However, the success of these strategies has always lacked the support of rigorous mathematical theory. To study the underlying mechanisms behind transformers and related techniques, we first propose a transformer learning framework motivated by distribution regression, with distributions being inputs, connect a two-stage sampling process with natural language processing, and present a mathematical formulation of the attention mechanism called attention operator. We demonstrate that by the attention operator, transformers can compress distributions into function representations without loss of information. Moreover, with the advantages of our novel attention operator, transformers exhibit a stronger capability to learn functionals with more complex structures than convolutional neural networks and fully connected networks. Finally, we obtain a generalization bound within the distribution regression framework. Throughout theoretical results, we further discuss some successful techniques emerging with large language models (LLMs), such as prompt tuning, parameter-efficient fine-tuning, and efficient scaling. We also provide theoretical insights behind these techniques within our novel analysis framework.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"260-293"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning in Associative Networks Through Pavlovian Dynamics.","authors":"Daniele Lotito, Miriam Aquaro, Chiara Marullo","doi":"10.1162/neco_a_01730","DOIUrl":"10.1162/neco_a_01730","url":null,"abstract":"<p><p>Hebbian learning theory is rooted in Pavlov's classical conditioning While mathematical models of the former have been proposed and studied in the past decades, especially in spin glass theory, only recently has it been numerically shown that it is possible to write neural and synaptic dynamics that mirror Pavlov conditioning mechanisms and also give rise to synaptic weights that correspond to the Hebbian learning rule. In this article we show that the same dynamics can be derived with equilibrium statistical mechanics tools and basic and motivated modeling assumptions. Then we show how to study the resulting system of coupled stochastic differential equations assuming the reasonable separation of neural and synaptic timescale. In particular, we analytically demonstrate that this synaptic evolution converges to the Hebbian learning rule in various settings and compute the variance of the stochastic process. Finally, drawing from evidence on pure memory reinforcement during sleep stages, we show how the proposed model can simulate neural networks that undergo sleep-associated memory consolidation processes, thereby proving the compatibility of Pavlovian learning with dreaming mechanisms.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"311-343"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Puyu Wang, Yunwen Lei, Di Wang, Yiming Ying, Ding-Xuan Zhou
{"title":"Generalization Guarantees of Gradient Descent for Shallow Neural Networks.","authors":"Puyu Wang, Yunwen Lei, Di Wang, Yiming Ying, Ding-Xuan Zhou","doi":"10.1162/neco_a_01725","DOIUrl":"10.1162/neco_a_01725","url":null,"abstract":"<p><p>Significant progress has been made recently in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling. Here, network scaling corresponds to the normalization of the layers. In this article, we greatly extend the previous work (Lei et al., 2022; Richards & Kuzborskij, 2021) by conducting a comprehensive stability and generalization analysis of GD for two-layer and three-layer NNs. For two-layer NNs, our results are established under general network scaling, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of overparameterization. As a direct application of our general findings, we derive the excess risk rate of O(1/n) for GD in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for underparameterized and overparameterized NNs trained by GD to attain the desired risk rate of O(1/n). Moreover, we demonstrate that as the scaling factor increases or the network complexity decreases, less overparameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of O(1/n) for GD in both two-layer and three-layer NNs.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"344-402"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Replay as a Basis for Backpropagation Through Time in the Brain.","authors":"Huzi Cheng, Joshua W Brown","doi":"10.1162/neco_a_01735","DOIUrl":"https://doi.org/10.1162/neco_a_01735","url":null,"abstract":"<p><p>How episodic memories are formed in the brain is a continuing puzzle for the neuroscience community. The brain areas that are critical for episodic learning (e.g., the hippocampus) are characterized by recurrent connectivity and generate frequent offline replay events. The function of the replay events is a subject of active debate. Recurrent connectivity, computational simulations show, enables sequence learning when combined with a suitable learning algorithm such as backpropagation through time (BPTT). BPTT, however, is not biologically plausible. We describe here, for the first time, a biologically plausible variant of BPTT in a reversible recurrent neural network, R2N2, that critically leverages offline replay to support episodic learning. The model uses forward and backward offline replay to transfer information between two recurrent neural networks, a cache and a consolidator, that perform rapid one-shot learning and statistical learning, respectively. Unlike replay in standard BPTT, this architecture requires no artificial external memory store. This approach outperforms existing solutions like random feedback local online learning and reservoir network. It also accounts for the functional significance of hippocampal replay events. We demonstrate the R2N2 network properties using benchmark tests from computer science and simulate the rodent delayed alternation T-maze task.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-34"},"PeriodicalIF":2.7,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gradual Domain Adaptation via Normalizing Flows.","authors":"Shogo Sagawa, Hideitsu Hino","doi":"10.1162/neco_a_01734","DOIUrl":"https://doi.org/10.1162/neco_a_01734","url":null,"abstract":"<p><p>Standard domain adaptation methods do not work well when a large gap exists between the source and target domains. Gradual domain adaptation is one of the approaches used to address the problem. It involves leveraging the intermediate domain, which gradually shifts from the source domain to the target domain. In previous work, it is assumed that the number of intermediate domains is large and the distance between adjacent domains is small; hence, the gradual domain adaptation algorithm, involving self-training with unlabeled data sets, is applicable. In practice, however, gradual self-training will fail because the number of intermediate domains is limited and the distance between adjacent domains is large. We propose the use of normalizing flows to deal with this problem while maintaining the framework of unsupervised domain adaptation. The proposed method learns a transformation from the distribution of the target domains to the gaussian mixture distribution via the source domain. We evaluate our proposed method by experiments using real-world data sets and confirm that it mitigates the problem we have explained and improves the classification performance.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-47"},"PeriodicalIF":2.7,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncovering Dynamical Equations of Stochastic Decision Models Using Data-Driven SINDy Algorithm.","authors":"Brendan Lenfesty, Saugat Bhattacharyya, KongFatt Wong-Lin","doi":"10.1162/neco_a_01736","DOIUrl":"https://doi.org/10.1162/neco_a_01736","url":null,"abstract":"<p><p>Decision formation in perceptual decision making involves sensory evidence accumulation instantiated by the temporal integration of an internal decision variable toward some decision criterion or threshold, as described by sequential sampling theoretical models. The decision variable can be represented in the form of experimentally observable neural activities. Hence, elucidating the appropriate theoretical model becomes crucial to understanding the mechanisms underlying perceptual decision formation. Existing computational methods are limited to either fitting of choice behavioral data or linear model estimation from neural activity data. In this work, we made use of sparse identification of nonlinear dynamics (SINDy), a data-driven approach, to elucidate the deterministic linear and nonlinear components of often-used stochastic decision models within reaction time task paradigms. Based on the simulated decision variable activities of the models and assuming the noise coefficient term is known beforehand, SINDy, enhanced with approaches using multiple trials, could readily estimate the deterministic terms in the dynamical equations, choice accuracy, and decision time of the models across a range of signal-to-noise ratio values. In particular, SINDy performed the best using the more memory-intensive multitrial approach while trial averaging of parameters performed more moderately. The single-trial approach, although expectedly not performing as well, may be useful for real-time modeling. Taken together, our work offers alternative approaches for SINDy to uncover the dynamics in perceptual decision making and, more generally, for first-passage time problems.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-19"},"PeriodicalIF":2.7,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Recall Accuracy in Sparse Associative Memories That Use Neurogenesis.","authors":"Katy Warr, Jonathon Hare, David Thomas","doi":"10.1162/neco_a_01732","DOIUrl":"https://doi.org/10.1162/neco_a_01732","url":null,"abstract":"<p><p>The creation of future low-power neuromorphic solutions requires specialist spiking neural network (SNN) algorithms that are optimized for neuromorphic settings. One such algorithmic challenge is the ability to recall learned patterns from their noisy variants. Solutions to this problem may be required to memorize vast numbers of patterns based on limited training data and subsequently recall the patterns in the presence of noise. To solve this problem, previous work has explored sparse associative memory (SAM)-associative memory neural models that exploit the principle of sparse neural coding observed in the brain. Research into a subcategory of SAM has been inspired by the biological process of adult neurogenesis, whereby new neurons are generated to facilitate adaptive and effective lifelong learning. Although these neurogenesis models have been demonstrated in previous research, they have limitations in terms of recall memory capacity and robustness to noise. In this letter, we provide a unifying framework for characterizing a type of SAM network that has been pretrained using a learning strategy that incorporated a simple neurogenesis model. Using this characterization, we formally define network topology and threshold optimization methods to empirically demonstrate greater than 10$^{{4}}$ times improvement in memory capacity compared to previous work. We show that these optimizations can facilitate the development of networks that have reduced interneuron connectivity while maintaining high recall efficacy. This paves the way for ongoing research into fast, effective, low-power realizations of associative memory on neuromorphic platforms.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-44"},"PeriodicalIF":2.7,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhichao Zhu, Yang Qi, Wenlian Lu, Zhigang Wang, Lu Cao, Jianfeng Feng
{"title":"Toward a Free-Response Paradigm of Decision-Making in Spiking Neural Networks.","authors":"Zhichao Zhu, Yang Qi, Wenlian Lu, Zhigang Wang, Lu Cao, Jianfeng Feng","doi":"10.1162/neco_a_01733","DOIUrl":"https://doi.org/10.1162/neco_a_01733","url":null,"abstract":"<p><p>Spiking neural networks (SNNs) have attracted significant interest in the development of brain-inspired computing systems due to their energy efficiency and similarities to biological information processing. In contrast to continuous-valued artificial neural networks, which produce results in a single step, SNNs require multiple steps during inference to achieve a desired accuracy level, resulting in a burden in real-time response and energy efficiency. Inspired by the tradeoff between speed and accuracy in human and animal decision-making processes, which exhibit correlations among reaction times, task complexity, and decision confidence, an inquiry emerges regarding how an SNN model can benefit by implementing these attributes. Here, we introduce a theory of decision making in SNNs by untangling the interplay between signal and noise. Under this theory, we introduce a new learning objective that trains an SNN not only to make the correct decisions but also to shape its confidence. Numerical experiments demonstrate that SNNs trained in this way exhibit improved confidence expression, reduced trial-to-trial variability, and shorter latency to reach the desired accuracy. We then introduce a stopping policy that can stop inference in a way that further enhances the time efficiency of SNNs. The stopping time can serve as an indicator to whether a decision is correct, akin to the reaction time in animal behavior experiments. By integrating stochasticity into decision making, this study opens up new possibilities to explore the capabilities of SNNs and advance SNNs and their applications in complex decision-making scenarios where model performance is limited.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-41"},"PeriodicalIF":2.7,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}