Credit Assignment: Challenges and Opportunities in Developing Human-like Learning Agents

Proceedings of the AAAI Symposium Series Pub Date : 2024-05-20 DOI:10.1609/aaaiss.v3i1.31180

Thuy Ngoc Nguyen, Chase McDonald, Cleotilde Gonzalez

{"title":"Credit Assignment: Challenges and Opportunities in Developing Human-like Learning Agents","authors":"Thuy Ngoc Nguyen, Chase McDonald, Cleotilde Gonzalez","doi":"10.1609/aaaiss.v3i1.31180","DOIUrl":null,"url":null,"abstract":"Temporal credit assignment is the process of distributing delayed outcomes to each action in a sequence, which is essential for learning to adapt and make decisions in dynamic environments. While computational methods in reinforcement learning, such as temporal difference (TD), have shown success in tackling this issue, it remains unclear whether these mechanisms accurately reflect how humans handle feedback delays. Furthermore, cognitive science research has not fully explored the credit assignment problem in humans and cognitive models. Our study uses a cognitive model based on Instance-Based Learning Theory (IBLT) to investigate various credit assignment mechanisms, including equal credit, exponential credit, and TD credit, using the IBL decision mechanism in a goal-seeking navigation task with feedback delays and varying levels of decision complexity. We compare the performance and process measures of the different models with human decision-making in two experiments. Our findings indicate that the human learning process cannot be fully explained by any of the mechanisms. We also observe that decision complexity affects human behavior but not model behavior. By examining the similarities and differences between human and model behavior, we summarize the challenges and opportunities for developing learning agents that emulate human decisions in dynamic environments.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"12 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Symposium Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaaiss.v3i1.31180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Temporal credit assignment is the process of distributing delayed outcomes to each action in a sequence, which is essential for learning to adapt and make decisions in dynamic environments. While computational methods in reinforcement learning, such as temporal difference (TD), have shown success in tackling this issue, it remains unclear whether these mechanisms accurately reflect how humans handle feedback delays. Furthermore, cognitive science research has not fully explored the credit assignment problem in humans and cognitive models. Our study uses a cognitive model based on Instance-Based Learning Theory (IBLT) to investigate various credit assignment mechanisms, including equal credit, exponential credit, and TD credit, using the IBL decision mechanism in a goal-seeking navigation task with feedback delays and varying levels of decision complexity. We compare the performance and process measures of the different models with human decision-making in two experiments. Our findings indicate that the human learning process cannot be fully explained by any of the mechanisms. We also observe that decision complexity affects human behavior but not model behavior. By examining the similarities and differences between human and model behavior, we summarize the challenges and opportunities for developing learning agents that emulate human decisions in dynamic environments.

查看原文本刊更多论文

学分作业：开发类人学习代理的挑战与机遇

时间学分分配是将延迟结果分配给序列中每个动作的过程，这对于在动态环境中学习适应和决策至关重要。虽然强化学习中的计算方法（如时间差（TD））在解决这一问题上取得了成功，但这些机制是否能准确反映人类如何处理反馈延迟仍不清楚。此外，认知科学研究尚未充分探讨人类和认知模型中的学分分配问题。我们的研究使用了基于实例学习理论（IBLT）的认知模型，在具有反馈延迟和不同决策复杂度的目标寻求导航任务中，利用 IBL 决策机制研究了各种学分分配机制，包括等额学分、指数学分和 TD 学分。我们在两个实验中将不同模型的性能和过程测量与人类决策进行了比较。我们的研究结果表明，人类的学习过程无法用任何一种机制来完全解释。我们还发现，决策复杂度会影响人类行为，但不会影响模型行为。通过研究人类行为与模型行为之间的异同，我们总结了在动态环境中开发模拟人类决策的学习代理所面临的挑战和机遇。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the AAAI Symposium Series

自引率

0.00%

发文量