Patrick Emami;Pan He;Sanjay Ranka;Anand Rangarajan
{"title":"Toward Improving the Generation Quality of Autoregressive Slot VAEs","authors":"Patrick Emami;Pan He;Sanjay Ranka;Anand Rangarajan","doi":"10.1162/neco_a_01635","DOIUrl":"10.1162/neco_a_01635","url":null,"abstract":"Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (“slots”) from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multiobject relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multiobject environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"858-896"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Samavat;Thomas M. Bartol;Kristen M. Harris;Terrence J. Sejnowski
{"title":"Synaptic Information Storage Capacity Measured With Information Theory","authors":"Mohammad Samavat;Thomas M. Bartol;Kristen M. Harris;Terrence J. Sejnowski","doi":"10.1162/neco_a_01659","DOIUrl":"10.1162/neco_a_01659","url":null,"abstract":"Variation in the strength of synapses can be quantified by measuring the anatomical properties of synapses. Quantifying precision of synaptic plasticity is fundamental to understanding information storage and retrieval in neural circuits. Synapses from the same axon onto the same dendrite have a common history of coactivation, making them ideal candidates for determining the precision of synaptic plasticity based on the similarity of their physical dimensions. Here, the precision and amount of information stored in synapse dimensions were quantified with Shannon information theory, expanding prior analysis that used signal detection theory (Bartol et al., 2015). The two methods were compared using dendritic spine head volumes in the middle of the stratum radiatum of hippocampal area CA1 as well-defined measures of synaptic strength. Information theory delineated the number of distinguishable synaptic strengths based on nonoverlapping bins of dendritic spine head volumes. Shannon entropy was applied to measure synaptic information storage capacity (SISC) and resulted in a lower bound of 4.1 bits and upper bound of 4.59 bits of information based on 24 distinguishable sizes. We further compared the distribution of distinguishable sizes and a uniform distribution using Kullback-Leibler divergence and discovered that there was a nearly uniform distribution of spine head volumes across the sizes, suggesting optimal use of the distinguishable values. Thus, SISC provides a new analytical measure that can be generalized to probe synaptic strengths and capacity for plasticity in different brain regions of different species and among animals raised in different conditions or during learning. How brain diseases and disorders affect the precision of synaptic plasticity can also be probed.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"781-802"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140779632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneous Forgetting Rates and Greedy Allocation in Slot-Based Memory Networks Promotes Signal Retention","authors":"BethAnna Jones;Lawrence Snyder;ShiNung Ching","doi":"10.1162/neco_a_01655","DOIUrl":"10.1162/neco_a_01655","url":null,"abstract":"A key question in the neuroscience of memory encoding pertains to the mechanisms by which afferent stimuli are allocated within memory networks. This issue is especially pronounced in the domain of working memory, where capacity is finite. Presumably the brain must embed some “policy” by which to allocate these mnemonic resources in an online manner in order to maximally represent and store afferent information for as long as possible and without interference from subsequent stimuli. Here, we engage this question through a top-down theoretical modeling framework. We formally optimize a gating mechanism that projects afferent stimuli onto a finite number of memory slots within a recurrent network architecture. In the absence of external input, the activity in each slot attenuates over time (i.e., a process of gradual forgetting). It turns out that the optimal gating policy consists of a direct projection from sensory activity to memory slots, alongside an activity-dependent lateral inhibition. Interestingly, allocating resources myopically (greedily with respect to the current stimulus) leads to efficient utilization of slots over time. In other words, later-arriving stimuli are distributed across slots in such a way that the network state is minimally shifted and so prior signals are minimally “overwritten.” Further, networks with heterogeneity in the timescales of their forgetting rates retain stimuli better than those that are more homogeneous. Our results suggest how online, recurrent networks working on temporally localized objectives without high-level supervision can nonetheless implement efficient allocation of memory resources over time.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"1022-1040"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140772905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Instance-Specific Model Perturbation Improves Generalized Zero-Shot Learning","authors":"Guanyu Yang;Kaizhu Huang;Rui Zhang;Xi Yang","doi":"10.1162/neco_a_01639","DOIUrl":"10.1162/neco_a_01639","url":null,"abstract":"Zero-shot learning (ZSL) refers to the design of predictive functions on new classes (unseen classes) of data that have never been seen during training. In a more practical scenario, generalized zero-shot learning (GZSL) requires predicting both seen and unseen classes accurately. In the absence of target samples, many GZSL models may overfit training data and are inclined to predict individuals as categories that have been seen in training. To alleviate this problem, we develop a parameter-wise adversarial training process that promotes robust recognition of seen classes while designing during the test a novel model perturbation mechanism to ensure sufficient sensitivity to unseen classes. Concretely, adversarial perturbation is conducted on the model to obtain instance-specific parameters so that predictions can be biased to unseen classes in the test. Meanwhile, the robust training encourages the model robustness, leading to nearly unaffected prediction for seen classes. Moreover, perturbations in the parameter space, computed from multiple individuals simultaneously, can be used to avoid the effect of perturbations that are too extreme and ruin the predictions. Comparison results on four benchmark ZSL data sets show the effective improvement that the proposed framework made on zero-shot methods with learned metrics.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"936-962"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CA3 Circuit Model Compressing Sequential Information in Theta Oscillation and Replay","authors":"Satoshi Kuroki;Kenji Mizuseki","doi":"10.1162/neco_a_01641","DOIUrl":"10.1162/neco_a_01641","url":null,"abstract":"The hippocampus plays a critical role in the compression and retrieval of sequential information. During wakefulness, it achieves this through theta phase precession and theta sequences. Subsequently, during periods of sleep or rest, the compressed information reactivates through sharp-wave ripple events, manifesting as memory replay. However, how these sequential neuronal activities are generated and how they store information about the external environment remain unknown. We developed a hippocampal cornu ammonis 3 (CA3) computational model based on anatomical and electrophysiological evidence from the biological CA3 circuit to address these questions. The model comprises theta rhythm inhibition, place input, and CA3-CA3 plastic recurrent connection. The model can compress the sequence of the external inputs, reproduce theta phase precession and replay, learn additional sequences, and reorganize previously learned sequences. A gradual increase in synaptic inputs, controlled by interactions between theta-paced inhibition and place inputs, explained the mechanism of sequence acquisition. This model highlights the crucial role of plasticity in the CA3 recurrent connection and theta oscillational dynamics and hypothesizes how the CA3 circuit acquires, compresses, and replays sequential information.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"501-548"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10535082","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seongil Im;Jae-Seung Jeong;Junseo Lee;Changhwan Shin;Jeong Ho Cho;Hyunsu Ju
{"title":"Column Row Convolutional Neural Network: Reducing Parameters for Efficient Image Processing","authors":"Seongil Im;Jae-Seung Jeong;Junseo Lee;Changhwan Shin;Jeong Ho Cho;Hyunsu Ju","doi":"10.1162/neco_a_01653","DOIUrl":"10.1162/neco_a_01653","url":null,"abstract":"Recent advancements in deep learning have achieved significant progress by increasing the number of parameters in a given model. However, this comes at the cost of computing resources, prompting researchers to explore model compression techniques that reduce the number of parameters while maintaining or even improving performance. Convolutional neural networks (CNN) have been recognized as more efficient and effective than fully connected (FC) networks. We propose a column row convolutional neural network (CRCNN) in this letter that applies 1D convolution to image data, significantly reducing the number of learning parameters and operational steps. The CRCNN uses column and row local receptive fields to perform data abstraction, concatenating each direction's feature before connecting it to an FC layer. Experimental results demonstrate that the CRCNN maintains comparable accuracy while reducing the number of parameters and compared to prior work. Moreover, the CRCNN is employed for one-class anomaly detection, demonstrating its feasibility for various applications.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"744-758"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vidyesh Rao Anisetti;Ananth Kandala;Benjamin Scellier;J. M. Schwarz
{"title":"Frequency Propagation: Multimechanism Learning in Nonlinear Physical Networks","authors":"Vidyesh Rao Anisetti;Ananth Kandala;Benjamin Scellier;J. M. Schwarz","doi":"10.1162/neco_a_01648","DOIUrl":"10.1162/neco_a_01648","url":null,"abstract":"We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an activation signal and an error signal whose coefficients can be read in different frequencies of the frequency domain. Each conductance is updated proportionally to the product of the two coefficients. The learning rule is local and proved to perform gradient descent on a loss function. We argue that frequency propagation is an instance of a multimechanism learning strategy for physical networks, be it resistive, elastic, or flow networks. Multimechanism learning strategies incorporate at least two physical quantities, potentially governed by independent physical mechanisms, to act as activation and error signals in the training process. Locally available information about these two signals is then used to update the trainable parameters to perform gradient descent. We demonstrate how earlier work implementing learning via chemical signaling in flow networks (Anisetti, Scellier, et al., 2023) also falls under the rubric of multimechanism learning.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"596-620"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Korobov Functions by Correntropy and Convolutional Neural Networks","authors":"Zhiying Fang;Tong Mao;Jun Fan","doi":"10.1162/neco_a_01650","DOIUrl":"10.1162/neco_a_01650","url":null,"abstract":"Combining information-theoretic learning with deep learning has gained significant attention in recent years, as it offers a promising approach to tackle the challenges posed by big data. However, the theoretical understanding of convolutional structures, which are vital to many structured deep learning models, remains incomplete. To partially bridge this gap, this letter aims to develop generalization analysis for deep convolutional neural network (CNN) algorithms using learning theory. Specifically, we focus on investigating robust regression using correntropy-induced loss functions derived from information-theoretic learning. Our analysis demonstrates an explicit convergence rate for deep CNN-based robust regression algorithms when the target function resides in the Korobov space. This study sheds light on the theoretical underpinnings of CNNs and provides a framework for understanding their performance and limitations.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"718-743"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lateral Connections Improve Generalizability of Learning in a Simple Neural Network","authors":"Garrett Crutcher","doi":"10.1162/neco_a_01640","DOIUrl":"10.1162/neco_a_01640","url":null,"abstract":"To navigate the world around us, neural circuits rapidly adapt to their environment learning generalizable strategies to decode information. When modeling these learning strategies, network models find the optimal solution to satisfy one task condition but fail when introduced to a novel task or even a different stimulus in the same space. In the experiments described in this letter, I investigate the role of lateral gap junctions in learning generalizable strategies to process information. Lateral gap junctions are formed by connexin proteins creating an open pore that allows for direct electrical signaling between two neurons. During neural development, the rate of gap junctions is high, and daughter cells that share similar tuning properties are more likely to be connected by these junctions. Gap junctions are highly plastic and get heavily pruned throughout development. I hypothesize that they mediate generalized learning by imprinting the weighting structure within a layer to avoid overfitting to one task condition. To test this hypothesis, I implemented a feedforward probabilistic neural network mimicking a cortical fast spiking neuron circuit that is heavily involved in movement. Many of these cells are tuned to speeds that I used as the input stimulus for the network to estimate. When training this network using a delta learning rule, both a laterally connected network and an unconnected network can estimate a single speed. However, when asking the network to estimate two or more speeds, alternated in training, an unconnected network either cannot learn speed or optimizes to a singular speed, while the laterally connected network learns the generalizable strategy and can estimate both speeds. These results suggest that lateral gap junctions between neurons enable generalized learning, which may help explain learning differences across life span.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"705-717"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probing the Structure and Functional Properties of the Dropout-Induced Correlated Variability in Convolutional Neural Networks","authors":"Xu Pan;Ruben Coen-Cagli;Odelia Schwartz","doi":"10.1162/neco_a_01652","DOIUrl":"10.1162/neco_a_01652","url":null,"abstract":"Computational neuroscience studies have shown that the structure of neural variability to an unchanged stimulus affects the amount of information encoded. Some artificial deep neural networks, such as those with Monte Carlo dropout layers, also have variable responses when the input is fixed. However, the structure of the trial-by-trial neural covariance in neural networks with dropout has not been studied, and its role in decoding accuracy is unknown. We studied the above questions in a convolutional neural network model with dropout in both the training and testing phases. We found that trial-by-trial correlation between neurons (i.e., noise correlation) is positive and low dimensional. Neurons that are close in a feature map have larger noise correlation. These properties are surprisingly similar to the findings in the visual cortex. We further analyzed the alignment of the main axes of the covariance matrix. We found that different images share a common trial-by-trial noise covariance subspace, and they are aligned with the global signal covariance. This evidence that the noise covariance is aligned with signal covariance suggests that noise covariance in dropout neural networks reduces network accuracy, which we further verified directly with a trial-shuffling procedure commonly used in neuroscience. These findings highlight a previously overlooked aspect of dropout layers that can affect network performance. Such dropout networks could also potentially be a computational model of neural variability.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"621-644"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}