Compositional Transfer in Hierarchical Reinforcement Learning

Markus Wulfmeier, A. Abdolmaleki, Roland Hafner, J. T. Springenberg, Michael Neunert, Tim Hertweck, T. Lampe, Noah Siegel, N. Heess, Martin A. Riedmiller
{"title":"Compositional Transfer in Hierarchical Reinforcement Learning","authors":"Markus Wulfmeier, A. Abdolmaleki, Roland Hafner, J. T. Springenberg, Michael Neunert, Tim Hertweck, T. Lampe, Noah Siegel, N. Heess, Martin A. Riedmiller","doi":"10.15607/rss.2020.xvi.054","DOIUrl":null,"url":null,"abstract":"The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multiple levels and corresponding mechanisms for sharing off-policy transition data across low-level controllers and tasks as well as scheduling of tasks. The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case. We show that the investigated types of hierarchy enable positive transfer while partially mitigating negative interference and evaluate the benefits of additional incentives for efficient, compositional task solutions in single task domains. Finally, we demonstrate substantial data-efficiency and final performance gains over competitive baselines in a week-long, physical robot stacking experiment.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"132 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/rss.2020.xvi.054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multiple levels and corresponding mechanisms for sharing off-policy transition data across low-level controllers and tasks as well as scheduling of tasks. The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case. We show that the investigated types of hierarchy enable positive transfer while partially mitigating negative interference and evaluate the benefits of additional incentives for efficient, compositional task solutions in single task domains. Finally, we demonstrate substantial data-efficiency and final performance gains over competitive baselines in a week-long, physical robot stacking experiment.
分层强化学习中的组合迁移
一般强化学习算法在现实机器人应用中的成功应用往往受到其高数据要求的限制。我们引入了正则化分层策略优化(RHPO)来提高具有多个主导任务的域的数据效率,并最终减少所需的平台时间。为此,我们在多个层次上采用组合归纳偏差和相应的机制,在低级控制器和任务之间共享非策略转换数据以及任务调度。该算法能够在并行多任务和顺序迁移情况下稳定快速地学习复杂的现实世界域。我们表明,所调查的层次结构类型能够实现正迁移,同时部分减轻负干扰,并评估了在单一任务域中有效的组合任务解决方案的额外激励的好处。最后,我们在为期一周的物理机器人堆叠实验中展示了大量的数据效率和最终性能增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信