Learning Options from Demonstration using Skill Segmentation

2020 International SAUPEC/RobMech/PRASA Conference Pub Date : 2020-01-01 DOI:10.1109/SAUPEC/RobMech/PRASA48453.2020.9040988

M. Cockcroft, Shahil Mawjee, Steven D. James, Pravesh Ranchod

引用次数: 3

Abstract

We present a method for learning options from segmented demonstration trajectories. The trajectories are first segmented into skills using nonparametric Bayesian clustering and a reward function for each segment is then learned using inverse reinforcement learning. From this, a set of inferred trajectories for the demonstration are generated. Option initiation sets and termination conditions are learned from these trajectories using the one-class support vector machine clustering algorithm. We demonstrate our method in the four rooms domain, where an agent is able to autonomously discover usable options from human demonstration. Our results show that these inferred options can then be used to improve learning and planning.

查看原文本刊更多论文

使用技能分割从演示中学习选项

我们提出了一种从分段演示轨迹中学习选项的方法。轨迹首先使用非参数贝叶斯聚类分割成技能，然后使用逆强化学习学习每个部分的奖励函数。由此，为演示生成了一组推断轨迹。利用单类支持向量机聚类算法从这些轨迹中学习期权起始集和终止条件。我们在四个房间域中演示了我们的方法，其中代理能够自主地从人类演示中发现可用的选项。我们的研究结果表明，这些推断选项可以用来提高学习和计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International SAUPEC/RobMech/PRASA Conference

自引率

0.00%

发文量