D. S. Vlasov, R. B. Rybka, A. V. Serenko, A. G. Sboev
{"title":"Spiking Neural Network Actor–Critic Reinforcement Learning with Temporal Coding and Reward-Modulated Plasticity","authors":"D. S. Vlasov, R. B. Rybka, A. V. Serenko, A. G. Sboev","doi":"10.3103/S0027134924702400","DOIUrl":null,"url":null,"abstract":"<p>The article presents an algorithm for adjusting the weights of the spike neural network of the actor–critic architecture. A feature of the algorithm is the use of time coding of input data. The critic neuron is applied to calculate the change in the expected value of the action performed based on the difference in spike times received by the critic when processing the previous and current states. The change in the weights of the synaptic connections of the actor and critic neurons is carried out under the influence of local plasticity (spike–timing-dependent plasticity), in which the change in weight depends on the received value of the expected reward. The proposed learning algorithm was tested to solve the problem of holding a cart pole, in which it demonstrated its effectiveness. The proposed algorithm is an important step towards the implementation of reinforcement learning algorithms for spiking neural networks on neuromorphic computing devices.</p>","PeriodicalId":711,"journal":{"name":"Moscow University Physics Bulletin","volume":"79 2 supplement","pages":"S944 - S952"},"PeriodicalIF":0.4000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Moscow University Physics Bulletin","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.3103/S0027134924702400","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The article presents an algorithm for adjusting the weights of the spike neural network of the actor–critic architecture. A feature of the algorithm is the use of time coding of input data. The critic neuron is applied to calculate the change in the expected value of the action performed based on the difference in spike times received by the critic when processing the previous and current states. The change in the weights of the synaptic connections of the actor and critic neurons is carried out under the influence of local plasticity (spike–timing-dependent plasticity), in which the change in weight depends on the received value of the expected reward. The proposed learning algorithm was tested to solve the problem of holding a cart pole, in which it demonstrated its effectiveness. The proposed algorithm is an important step towards the implementation of reinforcement learning algorithms for spiking neural networks on neuromorphic computing devices.
期刊介绍:
Moscow University Physics Bulletin publishes original papers (reviews, articles, and brief communications) in the following fields of experimental and theoretical physics: theoretical and mathematical physics; physics of nuclei and elementary particles; radiophysics, electronics, acoustics; optics and spectroscopy; laser physics; condensed matter physics; chemical physics, physical kinetics, and plasma physics; biophysics and medical physics; astronomy, astrophysics, and cosmology; physics of the Earth’s, atmosphere, and hydrosphere.