A theory of cerebellar learning as spike-based reinforcement learning in continuous time and space.

IF 3.8 Q2 MULTIDISCIPLINARY SCIENCES

PNAS nexus Pub Date : 2025-09-18 eCollection Date: 2025-10-01 DOI:10.1093/pnasnexus/pgaf302

Rin Kuriyama, Hideyuki Yoshimura, Tadashi Yamazaki

{"title":"A theory of cerebellar learning as spike-based reinforcement learning in continuous time and space.","authors":"Rin Kuriyama, Hideyuki Yoshimura, Tadashi Yamazaki","doi":"10.1093/pnasnexus/pgaf302","DOIUrl":null,"url":null,"abstract":"<p><p>The cerebellum has been considered to perform error-based supervised learning via long-term depression (LTD) at synapses between parallel fibers and Purkinje cells (PCs). Since the discovery of multiple synaptic plasticity other than LTD, recent studies have suggested that synergistic plasticity mechanisms could enhance the learning capability of the cerebellum. Indeed, we have proposed a concept of cerebellar learning as a reinforcement learning (RL) machine. However, there is still a gap between the conceptual algorithm and its detailed implementation. To close this gap, in this research, we implemented a cerebellar spiking network as an RL model in continuous time and space, based on known anatomical properties of the cerebellum. We confirmed that our model successfully learned a state value and solved the mountain car task, a simple RL benchmark. Furthermore, our model demonstrated the ability to solve the delay eyeblink conditioning task using biologically plausible internal dynamics. Our research provides a solid foundation for cerebellar RL theory that challenges the classical view of the cerebellum as primarily a supervised learning machine.</p>","PeriodicalId":74468,"journal":{"name":"PNAS nexus","volume":"4 10","pages":"pgaf302"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483077/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PNAS nexus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/pnasnexus/pgaf302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

The cerebellum has been considered to perform error-based supervised learning via long-term depression (LTD) at synapses between parallel fibers and Purkinje cells (PCs). Since the discovery of multiple synaptic plasticity other than LTD, recent studies have suggested that synergistic plasticity mechanisms could enhance the learning capability of the cerebellum. Indeed, we have proposed a concept of cerebellar learning as a reinforcement learning (RL) machine. However, there is still a gap between the conceptual algorithm and its detailed implementation. To close this gap, in this research, we implemented a cerebellar spiking network as an RL model in continuous time and space, based on known anatomical properties of the cerebellum. We confirmed that our model successfully learned a state value and solved the mountain car task, a simple RL benchmark. Furthermore, our model demonstrated the ability to solve the delay eyeblink conditioning task using biologically plausible internal dynamics. Our research provides a solid foundation for cerebellar RL theory that challenges the classical view of the cerebellum as primarily a supervised learning machine.

查看原文本刊更多论文

连续时间和空间中基于尖峰强化学习的小脑学习理论。

小脑被认为通过平行纤维和浦肯野细胞（PCs）之间的突触的长期抑制（LTD）来执行基于错误的监督学习。自发现除LTD外的多突触可塑性以来，最近的研究表明，协同可塑性机制可以增强小脑的学习能力。事实上，我们已经提出了小脑学习作为强化学习（RL）机器的概念。然而，在概念算法和具体实现之间仍然存在差距。为了缩小这一差距，在本研究中，我们基于小脑的已知解剖特性，在连续的时间和空间中实现了小脑脉冲网络作为RL模型。我们确认我们的模型成功地学习了状态值，并解决了山地车任务，这是一个简单的RL基准。此外，我们的模型证明了使用生物学上合理的内部动力学来解决延迟眨眼条件反射任务的能力。我们的研究为小脑强化学习理论提供了坚实的基础，挑战了小脑主要是监督学习机器的经典观点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PNAS nexus

CiteScore

1.80

自引率

0.00%

发文量