{"title":"State-transition-free reinforcement learning in chimpanzees (Pan troglodytes).","authors":"Yutaro Sato, Yutaka Sakai, Satoshi Hirata","doi":"10.3758/s13420-023-00591-3","DOIUrl":null,"url":null,"abstract":"<p><p>The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.</p>","PeriodicalId":49914,"journal":{"name":"Learning & Behavior","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Learning & Behavior","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13420-023-00591-3","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BEHAVIORAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.
行动的结果往往在延迟后出现。从延迟结果中学习适当行动的一种解决方案是依靠状态转换链。另一种不依赖于状态转换的解决方案是使用资格追踪(ET),通过瞬时记忆将当前结果与过去的多个行动直接连接起来。先前的研究表明,人类(智人)在一项行为任务中学习到了适当的行动,在这项任务中,基于 ET 的解决方案是有效的,而基于过渡的解决方案则无效。这表明,ET 可用于人类的学习系统。然而,目前还没有研究对非人类动物进行过类似的行为任务研究。根据之前的一项人类研究,我们为非人类动物设计了一项任务。在每次试验中,参与者从三种刺激类型中随机选择两种刺激中的一种,这三种刺激类型分别是:与立即提供的食物奖励相关的刺激、与数次试验后提供的奖励相关的刺激以及与无奖励相关的刺激。所呈现的刺激不会因参与者的选择而改变。为了使总奖励最大化,参与者必须学习与延迟奖励相关的刺激物的价值。五只黑猩猩(Pan troglodytes)使用触摸屏完成了这项任务。两只黑猩猩能够成功学习,这表明学习过程中涉及了不依赖于状态转换的学习机制。本研究提出了一项行为任务,并提供了黑猩猩的实证数据,从而扩展了之前的 ET 研究。
期刊介绍:
Learning & Behavior publishes experimental and theoretical contributions and critical reviews concerning fundamental processes of learning and behavior in nonhuman and human animals. Topics covered include sensation, perception, conditioning, learning, attention, memory, motivation, emotion, development, social behavior, and comparative investigations.