在POMDP执行和规范中利用人的反馈

Janine Hoelscher, Dorothea Koert, Jan Peters, J. Pajarinen
{"title":"在POMDP执行和规范中利用人的反馈","authors":"Janine Hoelscher, Dorothea Koert, Jan Peters, J. Pajarinen","doi":"10.1109/HUMANOIDS.2018.8625022","DOIUrl":null,"url":null,"abstract":"In many environments, robots have to handle partial observations, occlusions, and uncertainty. In this kind of setting, a partially observable Markov decision process (POMDP) is the method of choice for planning actions. However, especially in the presence of non-expert users, there are still open challenges preventing mass deployment of POMDPs in human environments. To this end, we present a novel approach that addresses both incorporating user objectives during task specification and asking humans for specific information during task execution; allowing for mutual information exchange. In POMDPs, the standard way of using a reward function to specify the task is challenging for experts and even more demanding for non-experts. We present a new POMDP algorithm that maximizes the probability of task success defined in the form of intuitive logic sentences. Moreover, we introduce the use of targeted queries in the POMDP model, through which the robot can request specific information. In contrast, most previous approaches rely on asking for full state information which can be cumbersome for users. Compared to previous approaches our approach is applicable to large state spaces. We evaluate the approach in a box stacking task both in simulations and experiments with a 7-DOF KUKA LWR arm. The experimental results confirm that asking targeted questions improves task performance significantly and that the robot successfully maximizes the probability of task success while fulfilling user-defined task objectives.","PeriodicalId":433345,"journal":{"name":"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Utilizing Human Feedback in POMDP Execution and Specification\",\"authors\":\"Janine Hoelscher, Dorothea Koert, Jan Peters, J. Pajarinen\",\"doi\":\"10.1109/HUMANOIDS.2018.8625022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many environments, robots have to handle partial observations, occlusions, and uncertainty. In this kind of setting, a partially observable Markov decision process (POMDP) is the method of choice for planning actions. However, especially in the presence of non-expert users, there are still open challenges preventing mass deployment of POMDPs in human environments. To this end, we present a novel approach that addresses both incorporating user objectives during task specification and asking humans for specific information during task execution; allowing for mutual information exchange. In POMDPs, the standard way of using a reward function to specify the task is challenging for experts and even more demanding for non-experts. We present a new POMDP algorithm that maximizes the probability of task success defined in the form of intuitive logic sentences. Moreover, we introduce the use of targeted queries in the POMDP model, through which the robot can request specific information. In contrast, most previous approaches rely on asking for full state information which can be cumbersome for users. Compared to previous approaches our approach is applicable to large state spaces. We evaluate the approach in a box stacking task both in simulations and experiments with a 7-DOF KUKA LWR arm. The experimental results confirm that asking targeted questions improves task performance significantly and that the robot successfully maximizes the probability of task success while fulfilling user-defined task objectives.\",\"PeriodicalId\":433345,\"journal\":{\"name\":\"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HUMANOIDS.2018.8625022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HUMANOIDS.2018.8625022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

在许多环境中,机器人必须处理部分观察、遮挡和不确定性。在这种情况下,部分可观察马尔可夫决策过程(POMDP)是规划行动的选择方法。然而,特别是在非专业用户存在的情况下,仍然存在阻碍pomdp在人类环境中大规模部署的开放挑战。为此,我们提出了一种新颖的方法,既可以在任务规范期间纳入用户目标,又可以在任务执行期间要求人类提供特定信息;允许相互交换信息。在pomdp中,使用奖励函数来指定任务的标准方法对专家来说是具有挑战性的,对非专家来说更是如此。我们提出了一种新的POMDP算法,该算法以直观逻辑句子的形式定义任务成功的概率最大化。此外,我们介绍了在POMDP模型中使用目标查询,机器人可以通过它请求特定的信息。相比之下,以前的大多数方法依赖于请求完整的状态信息,这对用户来说可能很麻烦。与以前的方法相比,我们的方法适用于大状态空间。我们在一个7自由度库卡LWR臂的模拟和实验中对该方法进行了评估。实验结果证实,提出有针对性的问题可以显著提高任务性能,并且机器人在实现用户定义的任务目标的同时,成功地最大化了任务成功的概率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Utilizing Human Feedback in POMDP Execution and Specification
In many environments, robots have to handle partial observations, occlusions, and uncertainty. In this kind of setting, a partially observable Markov decision process (POMDP) is the method of choice for planning actions. However, especially in the presence of non-expert users, there are still open challenges preventing mass deployment of POMDPs in human environments. To this end, we present a novel approach that addresses both incorporating user objectives during task specification and asking humans for specific information during task execution; allowing for mutual information exchange. In POMDPs, the standard way of using a reward function to specify the task is challenging for experts and even more demanding for non-experts. We present a new POMDP algorithm that maximizes the probability of task success defined in the form of intuitive logic sentences. Moreover, we introduce the use of targeted queries in the POMDP model, through which the robot can request specific information. In contrast, most previous approaches rely on asking for full state information which can be cumbersome for users. Compared to previous approaches our approach is applicable to large state spaces. We evaluate the approach in a box stacking task both in simulations and experiments with a 7-DOF KUKA LWR arm. The experimental results confirm that asking targeted questions improves task performance significantly and that the robot successfully maximizes the probability of task success while fulfilling user-defined task objectives.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信