通过强化学习产生类似信仰的表征。

IF 4.3 2区生物学

PLoS Computational Biology Pub Date : 2023-09-11 eCollection Date: 2023-09-01 DOI:10.1371/journal.pcbi.1011067

Jay A Hennig, Sandra A Romero Pinto, Takahiro Yamaguchi, Scott W Linderman, Naoshige Uchida, Samuel J Gershman

{"title":"通过强化学习产生类似信仰的表征。","authors":"Jay A Hennig, Sandra A Romero Pinto, Takahiro Yamaguchi, Scott W Linderman, Naoshige Uchida, Samuel J Gershman","doi":"10.1371/journal.pcbi.1011067","DOIUrl":null,"url":null,"abstract":"To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming \"beliefs\"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 9","pages":"e1011067"},"PeriodicalIF":4.3000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10513382/pdf/","citationCount":"0","resultStr":"{\"title\":\"Emergence of belief-like representations through reinforcement learning.\",\"authors\":\"Jay A Hennig, Sandra A Romero Pinto, Takahiro Yamaguchi, Scott W Linderman, Naoshige Uchida, Samuel J Gershman\",\"doi\":\"10.1371/journal.pcbi.1011067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming \\\"beliefs\\\"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.\",\"PeriodicalId\":49688,\"journal\":{\"name\":\"PLoS Computational Biology\",\"volume\":\"19 9\",\"pages\":\"e1011067\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2023-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10513382/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pcbi.1011067\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1011067","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了适应行为，动物必须学会预测未来的回报或价值。为了做到这一点，动物被认为可以通过强化学习来学习奖励预测。然而，与经典模型相比，动物必须学会仅使用不完整的状态信息来估计价值。先前的工作表明，动物通过首先形成“信念”（任务中隐藏状态的最优贝叶斯估计）来估计部分可观察任务的价值。尽管这是解决部分可观测性问题的一种方法，但它不是唯一的方法，也不是复杂现实世界环境中计算可扩展性最强的解决方案。在这里，我们证明了递归神经网络（RNN）可以学习直接从观测值中估计值，产生与实验观测值相似的奖励预测误差，而不需要任何明确的估计信念的目标。我们整合了关于信念的统计、函数和动态系统观点，以表明RNN的学习表示对信念信息进行编码，但仅当RNN的容量足够大时。这些结果说明了动物如何在不明确估计信念的情况下估计任务中的价值，从而产生对能力有限的系统有用的表示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Emergence of belief-like representations through reinforcement learning.

查看原文本刊更多论文

Emergence of belief-like representations through reinforcement learning.

To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS Computational Biology 生物-生化研究方法

CiteScore

7.10

自引率

4.70%

发文量

820

期刊介绍： PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.