Task Phasing: Automated Curriculum Learning from Demonstrations

Vaibhav Bajaj, Guni Sharon, P. Stone
{"title":"Task Phasing: Automated Curriculum Learning from Demonstrations","authors":"Vaibhav Bajaj, Guni Sharon, P. Stone","doi":"10.48550/arXiv.2210.10999","DOIUrl":null,"url":null,"abstract":"Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. \nCommon RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches have been studied in detail, they have rarely been considered together. This paper aims to do so by introducing a principled task-phasing approach that uses demonstrations to automatically generate a curriculum sequence. Using inverse RL from (suboptimal) demonstrations we define a simple initial task. Our task phasing approach then provides a framework to gradually increase the complexity of the task all the way to the target task, while retuning the RL agent in each phasing iteration. Two approaches for phasing are considered: (1) gradually increasing the proportion of time steps an RL agent is in control, and (2) phasing out a guiding informative reward function. We present conditions that guarantee the convergence of these approaches to an optimal policy. Experimental results on 3 sparse reward domains demonstrate that our task-phasing approaches outperform state-of-the-art approaches with respect to asymptotic performance.","PeriodicalId":239898,"journal":{"name":"International Conference on Automated Planning and Scheduling","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Automated Planning and Scheduling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.10999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches have been studied in detail, they have rarely been considered together. This paper aims to do so by introducing a principled task-phasing approach that uses demonstrations to automatically generate a curriculum sequence. Using inverse RL from (suboptimal) demonstrations we define a simple initial task. Our task phasing approach then provides a framework to gradually increase the complexity of the task all the way to the target task, while retuning the RL agent in each phasing iteration. Two approaches for phasing are considered: (1) gradually increasing the proportion of time steps an RL agent is in control, and (2) phasing out a guiding informative reward function. We present conditions that guarantee the convergence of these approaches to an optimal policy. Experimental results on 3 sparse reward domains demonstrate that our task-phasing approaches outperform state-of-the-art approaches with respect to asymptotic performance.
任务分阶段:从演示中自动学习课程
由于引导信号不足,将强化学习(RL)应用于稀疏奖励域是出了名的具有挑战性。处理这些领域的常用RL技术包括(1)从演示中学习和(2)课程学习。虽然对这两种方法进行了详细的研究,但很少将它们放在一起考虑。本文旨在通过引入一种原则性的任务分阶段方法来实现这一目标,该方法使用演示来自动生成课程序列。使用来自(次优)演示的逆强化学习,我们定义了一个简单的初始任务。然后,我们的任务分阶段方法提供了一个框架,可以逐步增加任务的复杂性,一直到目标任务,同时在每个分阶段迭代中返回RL代理。考虑了两种分阶段的方法:(1)逐步增加RL代理控制的时间步长比例;(2)分阶段取消引导信息奖励函数。我们给出了保证这些方法收敛到最优策略的条件。在3个稀疏奖励域上的实验结果表明,我们的任务分阶段方法在渐近性能方面优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信