海马依赖性柔性空间导航的强化学习方法

Brain and neuroscience advances Pub Date : 2020-07-31 DOI:10.1101/2020.07.30.229005

Charline Tessereau, R. O’Dea, S. Coombes, T. Bast

{"title":"海马依赖性柔性空间导航的强化学习方法","authors":"Charline Tessereau, R. O’Dea, S. Coombes, T. Bast","doi":"10.1101/2020.07.30.229005","DOIUrl":null,"url":null,"abstract":"Humans and non-human animals show great flexibility in spatial navigation, including the ability to return to specific locations based on as few as one single experience. To study spatial navigation in the laboratory, watermaze tasks, in which rats have to find a hidden platform in a pool of cloudy water surrounded by spatial cues, have long been used. Analogous tasks have been developed for human participants using virtual environments. Spatial learning in the watermaze is facilitated by the hippocampus. In particular, rapid, one-trial, allocentric place learning, as measured in the delayed-matching-to-place variant of the watermaze task, which requires rodents to learn repeatedly new locations in a familiar environment, is hippocampal dependent. In this article, we review some computational principles, embedded within a reinforcement learning framework, that utilise hippocampal spatial representations for navigation in watermaze tasks. We consider which key elements underlie their efficacy, and discuss their limitations in accounting for hippocampus-dependent navigation, both in terms of behavioural performance (i.e. how well do they reproduce behavioural measures of rapid place learning) and neurobiological realism (i.e. how well do they map to neurobiological substrates involved in rapid place learning). We discuss how an actor–critic architecture, enabling simultaneous assessment of the value of the current location and of the optimal direction to follow, can reproduce one-trial place learning performance as shown on watermaze and virtual delayed-matching-to-place tasks by rats and humans, respectively, if complemented with map-like place representations. The contribution of actor–critic mechanisms to delayed-matching-to-place performance is consistent with neurobiological findings implicating the striatum and hippocampo-striatal interaction in delayed-matching-to-place performance, given that the striatum has been associated with actor–critic mechanisms. Moreover, we illustrate that hierarchical computations embedded within an actor–critic architecture may help to account for aspects of flexible spatial navigation. The hierarchical reinforcement learning approach separates trajectory control via a temporal-difference error from goal selection via a goal prediction error and may account for flexible, trial-specific, navigation to familiar goal locations, as required in some arm-maze place memory tasks, although it does not capture one-trial learning of new goal locations, as observed in open field, including watermaze and virtual, delayed-matching-to-place tasks. Future models of one-shot learning of new goal locations, as observed on delayed-matching-to-place tasks, should incorporate hippocampal plasticity mechanisms that integrate new goal information with allocentric place representation, as such mechanisms are supported by substantial empirical evidence.","PeriodicalId":72444,"journal":{"name":"Brain and neuroscience advances","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Reinforcement learning approaches to hippocampus-dependent flexible spatial navigation\",\"authors\":\"Charline Tessereau, R. O’Dea, S. Coombes, T. Bast\",\"doi\":\"10.1101/2020.07.30.229005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Humans and non-human animals show great flexibility in spatial navigation, including the ability to return to specific locations based on as few as one single experience. To study spatial navigation in the laboratory, watermaze tasks, in which rats have to find a hidden platform in a pool of cloudy water surrounded by spatial cues, have long been used. Analogous tasks have been developed for human participants using virtual environments. Spatial learning in the watermaze is facilitated by the hippocampus. In particular, rapid, one-trial, allocentric place learning, as measured in the delayed-matching-to-place variant of the watermaze task, which requires rodents to learn repeatedly new locations in a familiar environment, is hippocampal dependent. In this article, we review some computational principles, embedded within a reinforcement learning framework, that utilise hippocampal spatial representations for navigation in watermaze tasks. We consider which key elements underlie their efficacy, and discuss their limitations in accounting for hippocampus-dependent navigation, both in terms of behavioural performance (i.e. how well do they reproduce behavioural measures of rapid place learning) and neurobiological realism (i.e. how well do they map to neurobiological substrates involved in rapid place learning). We discuss how an actor–critic architecture, enabling simultaneous assessment of the value of the current location and of the optimal direction to follow, can reproduce one-trial place learning performance as shown on watermaze and virtual delayed-matching-to-place tasks by rats and humans, respectively, if complemented with map-like place representations. The contribution of actor–critic mechanisms to delayed-matching-to-place performance is consistent with neurobiological findings implicating the striatum and hippocampo-striatal interaction in delayed-matching-to-place performance, given that the striatum has been associated with actor–critic mechanisms. Moreover, we illustrate that hierarchical computations embedded within an actor–critic architecture may help to account for aspects of flexible spatial navigation. The hierarchical reinforcement learning approach separates trajectory control via a temporal-difference error from goal selection via a goal prediction error and may account for flexible, trial-specific, navigation to familiar goal locations, as required in some arm-maze place memory tasks, although it does not capture one-trial learning of new goal locations, as observed in open field, including watermaze and virtual, delayed-matching-to-place tasks. Future models of one-shot learning of new goal locations, as observed on delayed-matching-to-place tasks, should incorporate hippocampal plasticity mechanisms that integrate new goal information with allocentric place representation, as such mechanisms are supported by substantial empirical evidence.\",\"PeriodicalId\":72444,\"journal\":{\"name\":\"Brain and neuroscience advances\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Brain and neuroscience advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2020.07.30.229005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain and neuroscience advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2020.07.30.229005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

人类和非人类动物在空间导航方面表现出极大的灵活性，包括基于一次体验返回特定位置的能力。为了研究实验室中的空间导航，长期以来一直使用水迷宫任务，即老鼠必须在被空间线索包围的浑浊水池中找到一个隐藏的平台。已经为使用虚拟环境的人类参与者开发了类似的任务。海马体促进了水迷宫中的空间学习。特别是，在一项试验中，快速的异地学习是海马依赖性的，这是在水迷宫任务的延迟匹配到位置变体中测量的，该任务要求啮齿动物在熟悉的环境中反复学习新的位置。在这篇文章中，我们回顾了一些嵌入强化学习框架中的计算原理，这些原理利用海马空间表示在水迷宫任务中进行导航。我们考虑了哪些关键因素是其功效的基础，并讨论了它们在解释海马体依赖性导航方面的局限性，包括行为表现（即它们在多大程度上再现了快速位置学习的行为测量）和神经生物学现实性（即它们与快速位置学习中涉及的神经生物学底物的映射程度）。我们讨论了一个行动者-评论家体系结构，能够同时评估当前位置的价值和最佳方向，如果辅以类似地图的位置表示，如何再现水迷宫和老鼠和人类分别与位置任务的虚拟延迟匹配所示的一个试验地学习表现。演员-评论家机制对延迟匹配到地点表现的贡献与神经生物学研究结果一致，该研究结果表明纹状体和海马-纹状体相互作用与延迟匹配到地方表现有关，因为纹状体与演员-评论家机制有关。此外，我们还说明了嵌入行动者-评论家体系结构中的分层计算可能有助于解释灵活空间导航的各个方面。分层强化学习方法将通过时间差误差的轨迹控制与通过目标预测误差的目标选择分离开来，并且可以根据一些手臂迷宫位置记忆任务的需要，考虑到对熟悉目标位置的灵活、特定试验的导航，尽管它没有捕捉到新目标位置的一次试验学习，包括水迷宫和虚拟、延迟匹配以放置任务。在延迟匹配到位置任务中观察到的新目标位置的一次性学习的未来模型，应该结合海马可塑性机制，将新目标信息与异中心位置表征相结合，因为这种机制有大量的经验证据支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reinforcement learning approaches to hippocampus-dependent flexible spatial navigation

Humans and non-human animals show great flexibility in spatial navigation, including the ability to return to specific locations based on as few as one single experience. To study spatial navigation in the laboratory, watermaze tasks, in which rats have to find a hidden platform in a pool of cloudy water surrounded by spatial cues, have long been used. Analogous tasks have been developed for human participants using virtual environments. Spatial learning in the watermaze is facilitated by the hippocampus. In particular, rapid, one-trial, allocentric place learning, as measured in the delayed-matching-to-place variant of the watermaze task, which requires rodents to learn repeatedly new locations in a familiar environment, is hippocampal dependent. In this article, we review some computational principles, embedded within a reinforcement learning framework, that utilise hippocampal spatial representations for navigation in watermaze tasks. We consider which key elements underlie their efficacy, and discuss their limitations in accounting for hippocampus-dependent navigation, both in terms of behavioural performance (i.e. how well do they reproduce behavioural measures of rapid place learning) and neurobiological realism (i.e. how well do they map to neurobiological substrates involved in rapid place learning). We discuss how an actor–critic architecture, enabling simultaneous assessment of the value of the current location and of the optimal direction to follow, can reproduce one-trial place learning performance as shown on watermaze and virtual delayed-matching-to-place tasks by rats and humans, respectively, if complemented with map-like place representations. The contribution of actor–critic mechanisms to delayed-matching-to-place performance is consistent with neurobiological findings implicating the striatum and hippocampo-striatal interaction in delayed-matching-to-place performance, given that the striatum has been associated with actor–critic mechanisms. Moreover, we illustrate that hierarchical computations embedded within an actor–critic architecture may help to account for aspects of flexible spatial navigation. The hierarchical reinforcement learning approach separates trajectory control via a temporal-difference error from goal selection via a goal prediction error and may account for flexible, trial-specific, navigation to familiar goal locations, as required in some arm-maze place memory tasks, although it does not capture one-trial learning of new goal locations, as observed in open field, including watermaze and virtual, delayed-matching-to-place tasks. Future models of one-shot learning of new goal locations, as observed on delayed-matching-to-place tasks, should incorporate hippocampal plasticity mechanisms that integrate new goal information with allocentric place representation, as such mechanisms are supported by substantial empirical evidence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Brain and neuroscience advances

自引率

0.00%

发文量

审稿时长

8 weeks