从干预中学习:作为显式和隐式反馈的人机交互

Robotics: Science and Systems XVI Pub Date : 2020-07-12 DOI:10.15607/rss.2020.xvi.055

Jonathan Spencer, Sanjiban Choudhury, Matt Barnes, Matt Schmittle, M. Chiang, P. Ramadge, S. Srinivasa

{"title":"从干预中学习:作为显式和隐式反馈的人机交互","authors":"Jonathan Spencer, Sanjiban Choudhury, Matt Barnes, Matt Schmittle, M. Chiang, P. Ramadge, S. Srinivasa","doi":"10.15607/rss.2020.xvi.055","DOIUrl":null,"url":null,"abstract":"—Scalable robot learning from seamless human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both. We formalize this as a constraint on the learner’s value function, which we can efﬁciently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.","PeriodicalId":231005,"journal":{"name":"Robotics: Science and Systems XVI","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Learning from Interventions: Human-robot interaction as both explicit and implicit feedback\",\"authors\":\"Jonathan Spencer, Sanjiban Choudhury, Matt Barnes, Matt Schmittle, M. Chiang, P. Ramadge, S. Srinivasa\",\"doi\":\"10.15607/rss.2020.xvi.055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"—Scalable robot learning from seamless human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both. We formalize this as a constraint on the learner’s value function, which we can efﬁciently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.\",\"PeriodicalId\":231005,\"journal\":{\"name\":\"Robotics: Science and Systems XVI\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics: Science and Systems XVI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15607/rss.2020.xvi.055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XVI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/rss.2020.xvi.055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

如果机器人要解决现实世界中的大量任务，那么从无缝人机交互中进行可扩展的机器人学习是至关重要的。目前的模仿学习方法有两个缺点。一方面，它们完全依赖于非政策的人类演示，这在某些情况下会导致训练测试分布的不匹配。另一方面，它们给人类的负担是给学习者访问的每一个状态都贴上标签，这使得它在许多应用中不切实际。我们认为，从专家的干预中进行互动学习是两全其美的。我们的关键观点是，任何数量的专家反馈，无论是通过干预还是不干预，都能提供有关当前状态质量、行动的最佳性或两者兼而有之的信息。我们将其形式化为对学习者价值函数的约束，我们可以使用无悔在线学习技术有效地学习它。我们称我们的方法为专家干预学习(EIL)，并与人类专家一起在真实和模拟的驾驶任务中对其进行评估，在那里它只需要几百个样本(大约一分钟)的专家控制就可以从零开始学习避免碰撞。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning from Interventions: Human-robot interaction as both explicit and implicit feedback

—Scalable robot learning from seamless human-robot interaction is critical if robots are to solve a multitude of tasks in the real world. Current approaches to imitation learning suffer from one of two drawbacks. On the one hand, they rely solely on off-policy human demonstration, which in some cases leads to a mismatch in train-test distribution. On the other, they burden the human to label every state the learner visits, rendering it impractical in many applications. We argue that learning interactively from expert interventions enjoys the best of both worlds. Our key insight is that any amount of expert feedback, whether by intervention or non-intervention, provides information about the quality of the current state, the optimality of the action, or both. We formalize this as a constraint on the learner’s value function, which we can efﬁciently learn using no regret, online learning techniques. We call our approach Expert Intervention Learning (EIL), and evaluate it on a real and simulated driving task with a human expert, where it learns collision avoidance from scratch with just a few hundred samples (about one minute) of expert control.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics: Science and Systems XVI

自引率

0.00%

发文量