黑猩猩(Pan troglodytes)的无状态转换强化学习。

IF 1.9 4区 心理学 Q3 BEHAVIORAL SCIENCES
Learning & Behavior Pub Date : 2023-12-01 Epub Date: 2023-06-27 DOI:10.3758/s13420-023-00591-3
Yutaro Sato, Yutaka Sakai, Satoshi Hirata
{"title":"黑猩猩(Pan troglodytes)的无状态转换强化学习。","authors":"Yutaro Sato, Yutaka Sakai, Satoshi Hirata","doi":"10.3758/s13420-023-00591-3","DOIUrl":null,"url":null,"abstract":"<p><p>The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.</p>","PeriodicalId":49914,"journal":{"name":"Learning & Behavior","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"State-transition-free reinforcement learning in chimpanzees (Pan troglodytes).\",\"authors\":\"Yutaro Sato, Yutaka Sakai, Satoshi Hirata\",\"doi\":\"10.3758/s13420-023-00591-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.</p>\",\"PeriodicalId\":49914,\"journal\":{\"name\":\"Learning & Behavior\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Learning & Behavior\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3758/s13420-023-00591-3\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/6/27 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"BEHAVIORAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Learning & Behavior","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13420-023-00591-3","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BEHAVIORAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

行动的结果往往在延迟后出现。从延迟结果中学习适当行动的一种解决方案是依靠状态转换链。另一种不依赖于状态转换的解决方案是使用资格追踪(ET),通过瞬时记忆将当前结果与过去的多个行动直接连接起来。先前的研究表明,人类(智人)在一项行为任务中学习到了适当的行动,在这项任务中,基于 ET 的解决方案是有效的,而基于过渡的解决方案则无效。这表明,ET 可用于人类的学习系统。然而,目前还没有研究对非人类动物进行过类似的行为任务研究。根据之前的一项人类研究,我们为非人类动物设计了一项任务。在每次试验中,参与者从三种刺激类型中随机选择两种刺激中的一种,这三种刺激类型分别是:与立即提供的食物奖励相关的刺激、与数次试验后提供的奖励相关的刺激以及与无奖励相关的刺激。所呈现的刺激不会因参与者的选择而改变。为了使总奖励最大化,参与者必须学习与延迟奖励相关的刺激物的价值。五只黑猩猩(Pan troglodytes)使用触摸屏完成了这项任务。两只黑猩猩能够成功学习,这表明学习过程中涉及了不依赖于状态转换的学习机制。本研究提出了一项行为任务,并提供了黑猩猩的实证数据,从而扩展了之前的 ET 研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

State-transition-free reinforcement learning in chimpanzees (Pan troglodytes).

State-transition-free reinforcement learning in chimpanzees (Pan troglodytes).

The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Learning & Behavior
Learning & Behavior 医学-动物学
CiteScore
2.90
自引率
5.60%
发文量
50
审稿时长
>12 weeks
期刊介绍: Learning & Behavior publishes experimental and theoretical contributions and critical reviews concerning fundamental processes of learning and behavior in nonhuman and human animals. Topics covered include sensation, perception, conditioning, learning, attention, memory, motivation, emotion, development, social behavior, and comparative investigations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信