Automaton-Guided Curriculum Generation for Reinforcement Learning Agents

Yash Shukla, A. Kulkarni, R. Wright, Alvaro Velasquez, J. Sinapov
{"title":"Automaton-Guided Curriculum Generation for Reinforcement Learning Agents","authors":"Yash Shukla, A. Kulkarni, R. Wright, Alvaro Velasquez, J. Sinapov","doi":"10.48550/arXiv.2304.05271","DOIUrl":null,"url":null,"abstract":"Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e.g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e.g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task's OOMDP description, and also when distractor objects are present that are not modeled in the logical specification of the tasks' objectives.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Automated Planning and Scheduling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.05271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e.g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e.g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task's OOMDP description, and also when distractor objects are present that are not modeled in the logical specification of the tasks' objectives.
用于强化学习代理的自动引导课程生成
尽管强化学习取得了进步,但许多顺序决策任务仍然非常昂贵且难以学习。最近,人们提出了从逻辑任务规范中自动生成奖励函数的方法来缓解这一问题;然而,它们在长期任务(即代理需要执行一系列正确动作以达到目标状态的任务,在选择动作时考虑未来的转换)上的可伸缩性很差。采用课程(一系列日益复杂的任务),通过排序适合智能体学习能力的中间任务,进一步提高了智能体的学习速度。然而,从逻辑规范生成课程仍然是一个未解决的问题。为此,我们提出了AGCL (Automaton-guided Curriculum Learning),这是一种以有向无环图(dag)形式自动生成目标任务课程的新方法。AGCL以确定性有限自动机(DFA)的形式对规范进行编码,然后使用DFA和面向对象的MDP (OOMDP)表示生成课程作为DAG,其中顶点对应任务,边对应知识转移的方向。在网格世界和基于物理的模拟机器人领域的实验表明,相对于最先进的课程学习(例如,师生、自我游戏)和自动引导的强化学习基线(例如,奖励机器的Q-Learning), AGCL制作的课程在复杂的顺序决策问题上实现了更好的时间到阈值性能。此外,我们证明,即使在任务的OOMDP描述中存在噪声,以及在任务目标的逻辑规范中没有建模的干扰对象存在时,AGCL也表现良好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信