在POMDP执行和规范中利用人的反馈

2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids) Pub Date : 2018-11-01 DOI:10.1109/HUMANOIDS.2018.8625022

Janine Hoelscher, Dorothea Koert, Jan Peters, J. Pajarinen

{"title":"在POMDP执行和规范中利用人的反馈","authors":"Janine Hoelscher, Dorothea Koert, Jan Peters, J. Pajarinen","doi":"10.1109/HUMANOIDS.2018.8625022","DOIUrl":null,"url":null,"abstract":"In many environments, robots have to handle partial observations, occlusions, and uncertainty. In this kind of setting, a partially observable Markov decision process (POMDP) is the method of choice for planning actions. However, especially in the presence of non-expert users, there are still open challenges preventing mass deployment of POMDPs in human environments. To this end, we present a novel approach that addresses both incorporating user objectives during task specification and asking humans for specific information during task execution; allowing for mutual information exchange. In POMDPs, the standard way of using a reward function to specify the task is challenging for experts and even more demanding for non-experts. We present a new POMDP algorithm that maximizes the probability of task success defined in the form of intuitive logic sentences. Moreover, we introduce the use of targeted queries in the POMDP model, through which the robot can request specific information. In contrast, most previous approaches rely on asking for full state information which can be cumbersome for users. Compared to previous approaches our approach is applicable to large state spaces. We evaluate the approach in a box stacking task both in simulations and experiments with a 7-DOF KUKA LWR arm. The experimental results confirm that asking targeted questions improves task performance significantly and that the robot successfully maximizes the probability of task success while fulfilling user-defined task objectives.","PeriodicalId":433345,"journal":{"name":"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Utilizing Human Feedback in POMDP Execution and Specification\",\"authors\":\"Janine Hoelscher, Dorothea Koert, Jan Peters, J. Pajarinen\",\"doi\":\"10.1109/HUMANOIDS.2018.8625022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many environments, robots have to handle partial observations, occlusions, and uncertainty. In this kind of setting, a partially observable Markov decision process (POMDP) is the method of choice for planning actions. However, especially in the presence of non-expert users, there are still open challenges preventing mass deployment of POMDPs in human environments. To this end, we present a novel approach that addresses both incorporating user objectives during task specification and asking humans for specific information during task execution; allowing for mutual information exchange. In POMDPs, the standard way of using a reward function to specify the task is challenging for experts and even more demanding for non-experts. We present a new POMDP algorithm that maximizes the probability of task success defined in the form of intuitive logic sentences. Moreover, we introduce the use of targeted queries in the POMDP model, through which the robot can request specific information. In contrast, most previous approaches rely on asking for full state information which can be cumbersome for users. Compared to previous approaches our approach is applicable to large state spaces. We evaluate the approach in a box stacking task both in simulations and experiments with a 7-DOF KUKA LWR arm. The experimental results confirm that asking targeted questions improves task performance significantly and that the robot successfully maximizes the probability of task success while fulfilling user-defined task objectives.\",\"PeriodicalId\":433345,\"journal\":{\"name\":\"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HUMANOIDS.2018.8625022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HUMANOIDS.2018.8625022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

在许多环境中，机器人必须处理部分观察、遮挡和不确定性。在这种情况下，部分可观察马尔可夫决策过程(POMDP)是规划行动的选择方法。然而，特别是在非专业用户存在的情况下，仍然存在阻碍pomdp在人类环境中大规模部署的开放挑战。为此，我们提出了一种新颖的方法，既可以在任务规范期间纳入用户目标，又可以在任务执行期间要求人类提供特定信息;允许相互交换信息。在pomdp中，使用奖励函数来指定任务的标准方法对专家来说是具有挑战性的，对非专家来说更是如此。我们提出了一种新的POMDP算法，该算法以直观逻辑句子的形式定义任务成功的概率最大化。此外，我们介绍了在POMDP模型中使用目标查询，机器人可以通过它请求特定的信息。相比之下，以前的大多数方法依赖于请求完整的状态信息，这对用户来说可能很麻烦。与以前的方法相比，我们的方法适用于大状态空间。我们在一个7自由度库卡LWR臂的模拟和实验中对该方法进行了评估。实验结果证实，提出有针对性的问题可以显著提高任务性能，并且机器人在实现用户定义的任务目标的同时，成功地最大化了任务成功的概率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Utilizing Human Feedback in POMDP Execution and Specification

In many environments, robots have to handle partial observations, occlusions, and uncertainty. In this kind of setting, a partially observable Markov decision process (POMDP) is the method of choice for planning actions. However, especially in the presence of non-expert users, there are still open challenges preventing mass deployment of POMDPs in human environments. To this end, we present a novel approach that addresses both incorporating user objectives during task specification and asking humans for specific information during task execution; allowing for mutual information exchange. In POMDPs, the standard way of using a reward function to specify the task is challenging for experts and even more demanding for non-experts. We present a new POMDP algorithm that maximizes the probability of task success defined in the form of intuitive logic sentences. Moreover, we introduce the use of targeted queries in the POMDP model, through which the robot can request specific information. In contrast, most previous approaches rely on asking for full state information which can be cumbersome for users. Compared to previous approaches our approach is applicable to large state spaces. We evaluate the approach in a box stacking task both in simulations and experiments with a 7-DOF KUKA LWR arm. The experimental results confirm that asking targeted questions improves task performance significantly and that the robot successfully maximizes the probability of task success while fulfilling user-defined task objectives.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)

自引率

0.00%

发文量