Learning Sequential Decision Tasks for Robot Manipulation with Abstract Markov Decision Processes and Demonstration-Guided Exploration

2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids) Pub Date : 2018-11-01 DOI:10.1109/HUMANOIDS.2018.8624949

Cassandra Kent, Siddhartha Banerjee, S. Chernova

{"title":"Learning Sequential Decision Tasks for Robot Manipulation with Abstract Markov Decision Processes and Demonstration-Guided Exploration","authors":"Cassandra Kent, Siddhartha Banerjee, S. Chernova","doi":"10.1109/HUMANOIDS.2018.8624949","DOIUrl":null,"url":null,"abstract":"Solving high-level sequential decision tasks situated on physical robots is a challenging problem. Reinforcement learning, the standard paradigm for solving sequential decision problems, allows robots to learn directly from experience, but is ill-equipped to deal with issues of scalability and uncertainty introduced by real-world tasks. We reformulate the problem representation to better apply to robot manipulation using the relations of Object-Oriented MDPs (OO-MDPs) and the hierarchical structure provided by Abstract MDPs (AMDPs). We present a relation-based AMDP formulation for solving tabletop organizational packing tasks, as well as a demonstration-guided exploration algorithm for learning AMDP transition functions inspired by state- and action-centric learning from demonstration approaches. We evaluate our representation and learning methods in a simulated environment, showing that our hierarchical representation is suitable for solving complex tasks, and that our state- and action-centric exploration biasing methods are both effective and complementary for efficiently learning AMDP transition functions. We show that the learned policy can be transferred to different tabletop organizational packing tasks, and validate that the policy can be realized on a physical system.","PeriodicalId":433345,"journal":{"name":"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HUMANOIDS.2018.8624949","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Solving high-level sequential decision tasks situated on physical robots is a challenging problem. Reinforcement learning, the standard paradigm for solving sequential decision problems, allows robots to learn directly from experience, but is ill-equipped to deal with issues of scalability and uncertainty introduced by real-world tasks. We reformulate the problem representation to better apply to robot manipulation using the relations of Object-Oriented MDPs (OO-MDPs) and the hierarchical structure provided by Abstract MDPs (AMDPs). We present a relation-based AMDP formulation for solving tabletop organizational packing tasks, as well as a demonstration-guided exploration algorithm for learning AMDP transition functions inspired by state- and action-centric learning from demonstration approaches. We evaluate our representation and learning methods in a simulated environment, showing that our hierarchical representation is suitable for solving complex tasks, and that our state- and action-centric exploration biasing methods are both effective and complementary for efficiently learning AMDP transition functions. We show that the learned policy can be transferred to different tabletop organizational packing tasks, and validate that the policy can be realized on a physical system.

查看原文本刊更多论文

基于抽象马尔可夫决策过程和演示引导探索的机器人操作顺序决策任务学习

解决基于物理机器人的高级顺序决策任务是一个具有挑战性的问题。强化学习是解决顺序决策问题的标准范例，它允许机器人直接从经验中学习，但在处理现实世界任务引入的可扩展性和不确定性问题方面装备不足。利用面向对象mdp (oo - mdp)之间的关系和抽象mdp (amdp)提供的层次结构，我们重新制定了问题表示，以便更好地应用于机器人操作。我们提出了一种基于关系的AMDP公式，用于解决桌面组织包装任务，以及一种演示指导的探索算法，用于学习AMDP转换函数，该算法受到来自演示方法的以状态和行动为中心的学习的启发。我们在模拟环境中评估了我们的表示和学习方法，表明我们的分层表示适用于解决复杂任务，并且我们的以状态和行动为中心的探索偏向方法对于高效学习AMDP转换函数是有效和互补的。我们展示了学习到的策略可以转移到不同的桌面组织包装任务，并验证了该策略可以在物理系统上实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)

自引率

0.00%

发文量