{"title":"一种生物启发强化学习模型,能说明惩罚后的快速适应性","authors":"Eric Chalmers , Artur Luczak","doi":"10.1016/j.nlm.2024.107974","DOIUrl":null,"url":null,"abstract":"<div><p>Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of <em>impaired</em> response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.</p></div>","PeriodicalId":19102,"journal":{"name":"Neurobiology of Learning and Memory","volume":"215 ","pages":"Article 107974"},"PeriodicalIF":2.2000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1074742724000856/pdfft?md5=c9ca4f1643f792be3695d63fd4923555&pid=1-s2.0-S1074742724000856-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment\",\"authors\":\"Eric Chalmers , Artur Luczak\",\"doi\":\"10.1016/j.nlm.2024.107974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of <em>impaired</em> response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.</p></div>\",\"PeriodicalId\":19102,\"journal\":{\"name\":\"Neurobiology of Learning and Memory\",\"volume\":\"215 \",\"pages\":\"Article 107974\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1074742724000856/pdfft?md5=c9ca4f1643f792be3695d63fd4923555&pid=1-s2.0-S1074742724000856-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurobiology of Learning and Memory\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1074742724000856\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BEHAVIORAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurobiology of Learning and Memory","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1074742724000856","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BEHAVIORAL SCIENCES","Score":null,"Total":0}
A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment
Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of impaired response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.
期刊介绍:
Neurobiology of Learning and Memory publishes articles examining the neurobiological mechanisms underlying learning and memory at all levels of analysis ranging from molecular biology to synaptic and neural plasticity and behavior. We are especially interested in manuscripts that examine the neural circuits and molecular mechanisms underlying learning, memory and plasticity in both experimental animals and human subjects.