{"title":"Extinction burst could be explained by curiosity-driven reinforcement learning","authors":"Kota Yamada, Hiroshi Matsui, Koji Toda","doi":"10.1101/2024.08.28.610088","DOIUrl":null,"url":null,"abstract":"Curiosity encourages agents to explore their environment, leading to learning opportunities. Although psychology and neurobiology have tackled how external rewards control behavior, how intrinsic factors control behavior remains unclear. An extinction burst is a behavioral phenomenon in which a sudden increase in the frequency of a behavior immediately follows the omission of a reward. Although the extinction burst is textbook knowledge in psychology, there is little empirical evidence of it in experimental situations. In this study, we show that the extinction burst can be explained by curiosity by combining computational modeling of behavior and empirical demonstrations in mice. First, we built a reinforcement learning model incorporating curiosity, defined as expected reward prediction errors, and the model additively controlled the agent's behavior to the primary reward. Simulations revealed that the curiosity-driven reinforcement learning model produced an extinction burst and burst intensity depended on the reward probability. Second, we established a behavioral procedure that captured extinction bursts in an experimental setup using mice. We conducted an operant conditioning task with head-fixed mice, in which the reward followed after pressing a lever at a given probability. After the training sessions, we occasionally withheld the reward delivery when the mice performed the task. We found that phasic bursts of responses occurred immediately after reward omission when responses were rewarded with a high probability, suggesting that the magnitude of reward prediction errors controlled the burst. These results provide theoretical and experimental evidence that intrinsic factors control behavior in adapting to an ever-changing environment.","PeriodicalId":501210,"journal":{"name":"bioRxiv - Animal Behavior and Cognition","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Animal Behavior and Cognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.28.610088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Curiosity encourages agents to explore their environment, leading to learning opportunities. Although psychology and neurobiology have tackled how external rewards control behavior, how intrinsic factors control behavior remains unclear. An extinction burst is a behavioral phenomenon in which a sudden increase in the frequency of a behavior immediately follows the omission of a reward. Although the extinction burst is textbook knowledge in psychology, there is little empirical evidence of it in experimental situations. In this study, we show that the extinction burst can be explained by curiosity by combining computational modeling of behavior and empirical demonstrations in mice. First, we built a reinforcement learning model incorporating curiosity, defined as expected reward prediction errors, and the model additively controlled the agent's behavior to the primary reward. Simulations revealed that the curiosity-driven reinforcement learning model produced an extinction burst and burst intensity depended on the reward probability. Second, we established a behavioral procedure that captured extinction bursts in an experimental setup using mice. We conducted an operant conditioning task with head-fixed mice, in which the reward followed after pressing a lever at a given probability. After the training sessions, we occasionally withheld the reward delivery when the mice performed the task. We found that phasic bursts of responses occurred immediately after reward omission when responses were rewarded with a high probability, suggesting that the magnitude of reward prediction errors controlled the burst. These results provide theoretical and experimental evidence that intrinsic factors control behavior in adapting to an ever-changing environment.