Credit Assignment: Challenges and Opportunities in Developing Human-like Learning Agents

Thuy Ngoc Nguyen, Chase McDonald, Cleotilde Gonzalez
{"title":"Credit Assignment: Challenges and Opportunities in Developing Human-like Learning Agents","authors":"Thuy Ngoc Nguyen, Chase McDonald, Cleotilde Gonzalez","doi":"10.1609/aaaiss.v3i1.31180","DOIUrl":null,"url":null,"abstract":"Temporal credit assignment is the process of distributing delayed outcomes to each action in a sequence, which is essential for learning to adapt and make decisions in dynamic environments. While computational methods in reinforcement learning, such as temporal difference (TD), have shown success in tackling this issue, it remains unclear whether these mechanisms accurately reflect how humans handle feedback delays. Furthermore, cognitive science research has not fully explored the credit assignment problem in humans and cognitive models. Our study uses a cognitive model based on Instance-Based Learning Theory (IBLT) to investigate various credit assignment mechanisms, including equal credit, exponential credit, and TD credit, using the IBL decision mechanism in a goal-seeking navigation task with feedback delays and varying levels of decision complexity. We compare the performance and process measures of the different models with human decision-making in two experiments. Our findings indicate that the human learning process cannot be fully explained by any of the mechanisms. We also observe that decision complexity affects human behavior but not model behavior. By examining the similarities and differences between human and model behavior, we summarize the challenges and opportunities for developing learning agents that emulate human decisions in dynamic environments.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"12 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Symposium Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaaiss.v3i1.31180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Temporal credit assignment is the process of distributing delayed outcomes to each action in a sequence, which is essential for learning to adapt and make decisions in dynamic environments. While computational methods in reinforcement learning, such as temporal difference (TD), have shown success in tackling this issue, it remains unclear whether these mechanisms accurately reflect how humans handle feedback delays. Furthermore, cognitive science research has not fully explored the credit assignment problem in humans and cognitive models. Our study uses a cognitive model based on Instance-Based Learning Theory (IBLT) to investigate various credit assignment mechanisms, including equal credit, exponential credit, and TD credit, using the IBL decision mechanism in a goal-seeking navigation task with feedback delays and varying levels of decision complexity. We compare the performance and process measures of the different models with human decision-making in two experiments. Our findings indicate that the human learning process cannot be fully explained by any of the mechanisms. We also observe that decision complexity affects human behavior but not model behavior. By examining the similarities and differences between human and model behavior, we summarize the challenges and opportunities for developing learning agents that emulate human decisions in dynamic environments.
学分作业:开发类人学习代理的挑战与机遇
时间学分分配是将延迟结果分配给序列中每个动作的过程,这对于在动态环境中学习适应和决策至关重要。虽然强化学习中的计算方法(如时间差(TD))在解决这一问题上取得了成功,但这些机制是否能准确反映人类如何处理反馈延迟仍不清楚。此外,认知科学研究尚未充分探讨人类和认知模型中的学分分配问题。我们的研究使用了基于实例学习理论(IBLT)的认知模型,在具有反馈延迟和不同决策复杂度的目标寻求导航任务中,利用 IBL 决策机制研究了各种学分分配机制,包括等额学分、指数学分和 TD 学分。我们在两个实验中将不同模型的性能和过程测量与人类决策进行了比较。我们的研究结果表明,人类的学习过程无法用任何一种机制来完全解释。我们还发现,决策复杂度会影响人类行为,但不会影响模型行为。通过研究人类行为与模型行为之间的异同,我们总结了在动态环境中开发模拟人类决策的学习代理所面临的挑战和机遇。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信