{"title":"基于随机反馈和生物约束的循环网络状态表示在线强化学习。","authors":"Takayuki Tsurumi, Ayaka Kato, Arvind Kumar, Kenji Morita","doi":"10.7554/eLife.104101","DOIUrl":null,"url":null,"abstract":"<p><p>Representation of external and internal states in the brain plays a critical role in enabling suitable behavior. Recent studies suggest that state representation and state value can be simultaneously learned through Temporal-Difference-Reinforcement-Learning (TDRL) and Backpropagation-Through-Time (BPTT) in recurrent neural networks (RNNs) and their readout. However, neural implementation of such learning remains unclear as BPTT requires offline update using transported downstream weights, which is suggested to be biologically implausible. We demonstrate that simple online training of RNNs using TD reward prediction error and random feedback, without additional memory or eligibility trace, can still learn the structure of tasks with cue-reward delay and timing variability. This is because TD learning itself is a solution for temporal credit assignment, and feedback alignment, a mechanism originally proposed for supervised learning, enables gradient approximation without weight transport. Furthermore, we show that biologically constraining downstream weights and random feedback to be non-negative not only preserves learning but may even enhance it because the non-negative constraint ensures loose alignment-allowing the downstream and feedback weights to roughly align from the beginning. These results provide insights into the neural mechanisms underlying the learning of state representation and value, highlighting the potential of random feedback and biological constraints.</p>","PeriodicalId":11640,"journal":{"name":"eLife","volume":"14 ","pages":""},"PeriodicalIF":6.4000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12459954/pdf/","citationCount":"0","resultStr":"{\"title\":\"Online reinforcement learning of state representation in recurrent network supported by the power of random feedback and biological constraints.\",\"authors\":\"Takayuki Tsurumi, Ayaka Kato, Arvind Kumar, Kenji Morita\",\"doi\":\"10.7554/eLife.104101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Representation of external and internal states in the brain plays a critical role in enabling suitable behavior. Recent studies suggest that state representation and state value can be simultaneously learned through Temporal-Difference-Reinforcement-Learning (TDRL) and Backpropagation-Through-Time (BPTT) in recurrent neural networks (RNNs) and their readout. However, neural implementation of such learning remains unclear as BPTT requires offline update using transported downstream weights, which is suggested to be biologically implausible. We demonstrate that simple online training of RNNs using TD reward prediction error and random feedback, without additional memory or eligibility trace, can still learn the structure of tasks with cue-reward delay and timing variability. This is because TD learning itself is a solution for temporal credit assignment, and feedback alignment, a mechanism originally proposed for supervised learning, enables gradient approximation without weight transport. Furthermore, we show that biologically constraining downstream weights and random feedback to be non-negative not only preserves learning but may even enhance it because the non-negative constraint ensures loose alignment-allowing the downstream and feedback weights to roughly align from the beginning. These results provide insights into the neural mechanisms underlying the learning of state representation and value, highlighting the potential of random feedback and biological constraints.</p>\",\"PeriodicalId\":11640,\"journal\":{\"name\":\"eLife\",\"volume\":\"14 \",\"pages\":\"\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12459954/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"eLife\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.7554/eLife.104101\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"eLife","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7554/eLife.104101","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
Online reinforcement learning of state representation in recurrent network supported by the power of random feedback and biological constraints.
Representation of external and internal states in the brain plays a critical role in enabling suitable behavior. Recent studies suggest that state representation and state value can be simultaneously learned through Temporal-Difference-Reinforcement-Learning (TDRL) and Backpropagation-Through-Time (BPTT) in recurrent neural networks (RNNs) and their readout. However, neural implementation of such learning remains unclear as BPTT requires offline update using transported downstream weights, which is suggested to be biologically implausible. We demonstrate that simple online training of RNNs using TD reward prediction error and random feedback, without additional memory or eligibility trace, can still learn the structure of tasks with cue-reward delay and timing variability. This is because TD learning itself is a solution for temporal credit assignment, and feedback alignment, a mechanism originally proposed for supervised learning, enables gradient approximation without weight transport. Furthermore, we show that biologically constraining downstream weights and random feedback to be non-negative not only preserves learning but may even enhance it because the non-negative constraint ensures loose alignment-allowing the downstream and feedback weights to roughly align from the beginning. These results provide insights into the neural mechanisms underlying the learning of state representation and value, highlighting the potential of random feedback and biological constraints.
期刊介绍:
eLife is a distinguished, not-for-profit, peer-reviewed open access scientific journal that specializes in the fields of biomedical and life sciences. eLife is known for its selective publication process, which includes a variety of article types such as:
Research Articles: Detailed reports of original research findings.
Short Reports: Concise presentations of significant findings that do not warrant a full-length research article.
Tools and Resources: Descriptions of new tools, technologies, or resources that facilitate scientific research.
Research Advances: Brief reports on significant scientific advancements that have immediate implications for the field.
Scientific Correspondence: Short communications that comment on or provide additional information related to published articles.
Review Articles: Comprehensive overviews of a specific topic or field within the life sciences.