一种生物启发强化学习模型,能说明惩罚后的快速适应性

IF 2.2 4区 心理学 Q3 BEHAVIORAL SCIENCES
Eric Chalmers , Artur Luczak
{"title":"一种生物启发强化学习模型,能说明惩罚后的快速适应性","authors":"Eric Chalmers ,&nbsp;Artur Luczak","doi":"10.1016/j.nlm.2024.107974","DOIUrl":null,"url":null,"abstract":"<div><p>Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of <em>impaired</em> response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.</p></div>","PeriodicalId":19102,"journal":{"name":"Neurobiology of Learning and Memory","volume":"215 ","pages":"Article 107974"},"PeriodicalIF":2.2000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1074742724000856/pdfft?md5=c9ca4f1643f792be3695d63fd4923555&pid=1-s2.0-S1074742724000856-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment\",\"authors\":\"Eric Chalmers ,&nbsp;Artur Luczak\",\"doi\":\"10.1016/j.nlm.2024.107974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of <em>impaired</em> response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.</p></div>\",\"PeriodicalId\":19102,\"journal\":{\"name\":\"Neurobiology of Learning and Memory\",\"volume\":\"215 \",\"pages\":\"Article 107974\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1074742724000856/pdfft?md5=c9ca4f1643f792be3695d63fd4923555&pid=1-s2.0-S1074742724000856-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurobiology of Learning and Memory\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1074742724000856\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BEHAVIORAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurobiology of Learning and Memory","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1074742724000856","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BEHAVIORAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

当以前的奖励策略受到惩罚时,人类和动物可以迅速学会一种新策略。但很难用强化学习方法来模拟这种情况,因为它们倾向于坚持以前学习的策略--这是对惩罚反应受损的标志。过去的研究通过使用特别参数或并行学习系统来增强传统的强化学习方程来解决这个问题。这种方法产生的强化学习模型可以解释逆转学习,但更加抽象、复杂,而且在一定程度上脱离了神经基质。在这里,我们采用了一种不同的方法:我们概括了最近发现的神经元级学习规则,假设它捕捉到了可能发生在全脑级的学习基本原理。令人惊奇的是,这给出了一个新的强化学习规则,它考虑到了适应和损失转移行为,并且只使用了与传统强化学习方程相同的参数。在新规则中,驱动强化学习的正常奖赏预测误差会被代理赋予触发奖赏或惩罚的行动的可能性所缩放。新规则在纸牌排序和可变爱荷华赌博任务中表现出快速适应性,还表现出类似人类的选择悖论效应。它将对学习和行为建模的实验研究人员有所帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment

Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of impaired response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.10
自引率
7.40%
发文量
77
审稿时长
12.6 weeks
期刊介绍: Neurobiology of Learning and Memory publishes articles examining the neurobiological mechanisms underlying learning and memory at all levels of analysis ranging from molecular biology to synaptic and neural plasticity and behavior. We are especially interested in manuscripts that examine the neural circuits and molecular mechanisms underlying learning, memory and plasticity in both experimental animals and human subjects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信