Daniel Kunin;Javier Sagastuy-Brena;Lauren Gillespie;Eshed Margalit;Hidenori Tanaka;Surya Ganguli;Daniel L. K. Yamins
{"title":"The Limiting Dynamics of SGD: Modified Loss, Phase-Space Oscillations, and Anomalous Diffusion","authors":"Daniel Kunin;Javier Sagastuy-Brena;Lauren Gillespie;Eshed Margalit;Hidenori Tanaka;Surya Ganguli;Daniel L. K. Yamins","doi":"10.1162/neco_a_01626","DOIUrl":"10.1162/neco_a_01626","url":null,"abstract":"In this work, we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance traveled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate interaction among the hyperparameters of optimization, the structure in the gradient noise, and the Hessian matrix at the end of training that explains this anomalous diffusion. To build this understanding, we first derive a continuous-time model for SGD with finite learning rates and batch sizes as an underdamped Langevin equation. We study this equation in the setting of linear regression, where we can derive exact, analytic expressions for the phase-space dynamics of the parameters and their instantaneous velocities from initialization to stationarity. Using the Fokker-Planck equation, we show that the key ingredient driving these dynamics is not the original training loss but rather the combination of a modified loss, which implicitly regularizes the velocity, and probability currents that cause oscillations in phase space. We identify qualitative and quantitative predictions of this theory in the dynamics of a ResNet-18 model trained on ImageNet. Through the lens of statistical physics, we uncover a mechanistic origin for the anomalous limiting dynamics of deep neural networks trained with SGD. Understanding the limiting dynamics of SGD, and its dependence on various important hyperparameters like batch size, learning rate, and momentum, can serve as a basis for future work that can turn these insights into algorithmic gains.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"151-174"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Evaluation of Matrix Factorization for fMRI Data","authors":"Yusuke Endo;Koujin Takeda","doi":"10.1162/neco_a_01628","DOIUrl":"10.1162/neco_a_01628","url":null,"abstract":"A hypothesis in the study of the brain is that sparse coding is realized in information representation of external stimuli, which has been experimentally confirmed for visual stimulus recently. However, unlike the specific functional region in the brain, sparse coding in information processing in the whole brain has not been clarified sufficiently. In this study, we investigate the validity of sparse coding in the whole human brain by applying various matrix factorization methods to functional magnetic resonance imaging data of neural activities in the brain. The result suggests the sparse coding hypothesis in information representation in the whole human brain, because extracted features from the sparse matrix factorization (MF) method, sparse principal component analysis (SparsePCA), or method of optimal directions (MOD) under a high sparsity setting or an approximate sparse MF method, fast independent component analysis (FastICA), can classify external visual stimuli more accurately than the nonsparse MF method or sparse MF method under a low sparsity setting.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"128-150"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anru R. Zhang;Ryan P. Bell;Chen An;Runshi Tang;Shana A. Hall;Cliburn Chan;Kareem Al-Khalil;Christina S. Meade
{"title":"Cocaine Use Prediction With Tensor-Based Machine Learning on Multimodal MRI Connectome Data","authors":"Anru R. Zhang;Ryan P. Bell;Chen An;Runshi Tang;Shana A. Hall;Cliburn Chan;Kareem Al-Khalil;Christina S. Meade","doi":"10.1162/neco_a_01623","DOIUrl":"10.1162/neco_a_01623","url":null,"abstract":"This letter considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study used functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275 individuals, which was then parcellated into 246 regions of interest (ROIs) using the Brainnetome atlas. After data preprocessing, the data sets were transformed into tensor form. We developed a tensor-based unsupervised machine learning algorithm to reduce the size of the data tensor from 275 (individuals) × 2 (fMRI and dMRI) × 246 (ROIs) × 246 (ROIs) to 275 (individuals) × 2 (fMRI and dMRI) × 6 (clusters) × 6 (clusters). This was achieved by applying the high-order Lloyd algorithm to group the ROI data into six clusters. Features were extracted from the reduced tensor and combined with demographic features (age, gender, race, and HIV status). The resulting data set was used to train a Catboost model using subsampling and nested cross-validation techniques, which achieved a prediction accuracy of 0.857 for identifying cocaine users. The model was also compared with other models, and the feature importance of the model was presented. Overall, this study highlights the potential for using tensor-based machine learning algorithms to predict cocaine use based on MRI connectomic data and presents a promising approach for identifying individuals at risk of substance abuse.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"107-127"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anca Rǎdulescu;Danae Evans;Amani-Dasia Augustin;Anthony Cooper;Johan Nakuci;Sarah Muldoon
{"title":"Synchronization and Clustering in Complex Quadratic Networks","authors":"Anca Rǎdulescu;Danae Evans;Amani-Dasia Augustin;Anthony Cooper;Johan Nakuci;Sarah Muldoon","doi":"10.1162/neco_a_01624","DOIUrl":"10.1162/neco_a_01624","url":null,"abstract":"Synchronization and clustering are well studied in the context of networks of oscillators, such as neuronal networks. However, this relationship is notoriously difficult to approach mathematically in natural, complex networks. Here, we aim to understand it in a canonical framework, using complex quadratic node dynamics, coupled in networks that we call complex quadratic networks (CQNs). We review previously defined extensions of the Mandelbrot and Julia sets for networks, focusing on the behavior of the node-wise projections of these sets and on describing the phenomena of node clustering and synchronization. One aspect of our work consists of exploring ties between a network's connectivity and its ensemble dynamics by identifying mechanisms that lead to clusters of nodes exhibiting identical or different Mandelbrot sets. Based on our preliminary analytical results (obtained primarily in two-dimensional networks), we propose that clustering is strongly determined by the network connectivity patterns, with the geometry of these clusters further controlled by the connection weights. Here, we first explore this relationship further, using examples of synthetic networks, increasing in size (from 3, to 5, to 20 nodes). We then illustrate the potential practical implications of synchronization in an existing set of whole brain, tractography-based networks obtained from 197 human subjects using diffusion tensor imaging. Understanding the similarities to how these concepts apply to CQNs contributes to our understanding of universal principles in dynamic networks and may help extend theoretical results to natural, complex systems.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"75-106"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajesh P. N. Rao;Dimitrios C. Gklezakos;Vishwas Sathish
{"title":"Active Predictive Coding: A Unifying Neural Model for Active Perception, Compositional Learning, and Hierarchical Planning","authors":"Rajesh P. N. Rao;Dimitrios C. Gklezakos;Vishwas Sathish","doi":"10.1162/neco_a_01627","DOIUrl":"10.1162/neco_a_01627","url":null,"abstract":"There is growing interest in predictive coding as a model of how the brain learns through predictions and prediction errors. Predictive coding models have traditionally focused on sensory coding and perception. Here we introduce active predictive coding (APC) as a unifying model for perception, action, and cognition. The APC model addresses important open problems in cognitive science and AI, including (1) how we learn compositional representations (e.g., part-whole hierarchies for equivariant vision) and (2) how we solve large-scale planning problems, which are hard for traditional reinforcement learning, by composing complex state dynamics and abstract actions from simpler dynamics and primitive actions. By using hypernetworks, self-supervised learning, and reinforcement learning, APC learns hierarchical world models by combining task-invariant state transition networks and task-dependent policy networks at multiple abstraction levels. We illustrate the applicability of the APC model to active visual perception and hierarchical planning. Our results represent, to our knowledge, the first proof-of-concept demonstration of a unified approach to addressing the part-whole learning problem in vision, the nested reference frames learning problem in cognition, and the integrated state-action hierarchy learning problem in reinforcement learning.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"1-32"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Filter Model of Cerebellum for Biological Muscle Control With Spike Train Inputs","authors":"Emma Wilson","doi":"10.1162/neco_a_01617","DOIUrl":"10.1162/neco_a_01617","url":null,"abstract":"Prior applications of the cerebellar adaptive filter model have included a range of tasks within simulated and robotic systems. However, this has been limited to systems driven by continuous signals. Here, the adaptive filter model of the cerebellum is applied to the control of a system driven by spiking inputs by considering the problem of controlling muscle force. The performance of the standard adaptive filter algorithm is compared with the algorithm with a modified learning rule that minimizes inputs and a simple proportional-integral-derivative (PID) controller. Control performance is evaluated in terms of the number of spikes, the accuracy of spike input locations, and the accuracy of muscle force output. Results show that the cerebellar adaptive filter model can be applied without change to the control of systems driven by spiking inputs. The cerebellar algorithm results in good agreement between input spikes and force outputs and significantly improves on a PID controller. Input minimization can be used to reduce the number of spike inputs, but at the expense of a decrease in accuracy of spike input location and force output. This work extends the applications of the cerebellar algorithm and demonstrates the potential of the adaptive filter model to be used to improve functional electrical stimulation muscle control.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"35 12","pages":"1938-1969"},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Smets;Werner Van Leekwijck;Ing Jyh Tsang;Steven Latré
{"title":"Training a Hyperdimensional Computing Classifier Using a Threshold on Its Confidence","authors":"Laura Smets;Werner Van Leekwijck;Ing Jyh Tsang;Steven Latré","doi":"10.1162/neco_a_01618","DOIUrl":"10.1162/neco_a_01618","url":null,"abstract":"Hyperdimensional computing (HDC) has become popular for light-weight and energy-efficient machine learning, suitable for wearable Internet-of-Things devices and near-sensor or on-device processing. HDC is computationally less complex than traditional deep learning algorithms and achieves moderate to good classification performance. This letter proposes to extend the training procedure in HDC by taking into account not only wrongly classified samples but also samples that are correctly classified by the HDC model but with low confidence. We introduce a confidence threshold that can be tuned for each data set to achieve the best classification accuracy. The proposed training procedure is tested on UCIHAR, CTG, ISOLET, and HAND data sets for which the performance consistently improves compared to the baseline across a range of confidence threshold values. The extended training procedure also results in a shift toward higher confidence values of the correctly classified samples, making the classifier not only more accurate but also more confident about its predictions.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"35 12","pages":"2006-2023"},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hojin Jang;Syed Suleman Abbas Zaidi;Xavier Boix;Neeraj Prasad;Sharon Gilad-Gutnick;Shlomit Ben-Ami;Pawan Sinha
{"title":"Robustness to Transformations Across Categories: Is Robustness Driven by Invariant Neural Representations?","authors":"Hojin Jang;Syed Suleman Abbas Zaidi;Xavier Boix;Neeraj Prasad;Sharon Gilad-Gutnick;Shlomit Ben-Ami;Pawan Sinha","doi":"10.1162/neco_a_01621","DOIUrl":"10.1162/neco_a_01621","url":null,"abstract":"Deep convolutional neural networks (DCNNs) have demonstrated impressive robustness to recognize objects under transformations (e.g., blur or noise) when these transformations are included in the training set. A hypothesis to explain such robustness is that DCNNs develop invariant neural representations that remain unaltered when the image is transformed. However, to what extent this hypothesis holds true is an outstanding question, as robustness to transformations could be achieved with properties different from invariance; for example, parts of the network could be specialized to recognize either transformed or nontransformed images. This article investigates the conditions under which invariant neural representations emerge by leveraging that they facilitate robustness to transformations beyond the training distribution. Concretely, we analyze a training paradigm in which only some object categories are seen transformed during training and evaluate whether the DCNN is robust to transformations across categories not seen transformed. Our results with state-of-the-art DCNNs indicate that invariant neural representations do not always drive robustness to transformations, as networks show robustness for categories seen transformed during training even in the absence of invariant neural representations. Invariance emerges only as the number of transformed categories in the training set is increased. This phenomenon is much more prominent with local transformations such as blurring and high-pass filtering than geometric transformations such as rotation and thinning, which entail changes in the spatial arrangement of the object. Our results contribute to a better understanding of invariant neural representations in deep learning and the conditions under which it spontaneously emerges.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"35 12","pages":"1910-1937"},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predictive Coding as a Neuromorphic Alternative to Backpropagation: A Critical Evaluation","authors":"Umais Zahid;Qinghai Guo;Zafeirios Fountas","doi":"10.1162/neco_a_01620","DOIUrl":"10.1162/neco_a_01620","url":null,"abstract":"Backpropagation has rapidly become the workhorse credit assignment algorithm for modern deep learning methods. Recently, modified forms of predictive coding (PC), an algorithm with origins in computational neuroscience, have been shown to result in approximately or exactly equal parameter updates to those under backpropagation. Due to this connection, it has been suggested that PC can act as an alternative to backpropagation with desirable properties that may facilitate implementation in neuromorphic systems. Here, we explore these claims using the different contemporary PC variants proposed in the literature. We obtain time complexity bounds for these PC variants, which we show are lower bounded by backpropagation. We also present key properties of these variants that have implications for neurobiological plausibility and their interpretations, particularly from the perspective of standard PC as a variational Bayes algorithm for latent probabilistic models. Our findings shed new light on the connection between the two learning frameworks and suggest that in its current forms, PC may have more limited potential as a direct replacement of backpropagation than previously envisioned.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"35 12","pages":"1881-1909"},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank Training Data Modifications","authors":"Hiroyuki Hanada;Noriaki Hashimoto;Kouichi Taji;Ichiro Takeuchi","doi":"10.1162/neco_a_01619","DOIUrl":"10.1162/neco_a_01619","url":null,"abstract":"In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework, the low-rank update, that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a the generalized low-rank update (GLRU) method, which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as support vector machines and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the number of data set changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"35 12","pages":"1970-2005"},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}