Learning something from nothing: Leveraging implicit human feedback strategies

The 23rd IEEE International Symposium on Robot and Human Interactive Communication Pub Date : 2014-10-20 DOI:10.1109/ROMAN.2014.6926319

R. Loftin, Bei Peng, J. MacGlashan, M. Littman, Matthew E. Taylor, Jeff Huang, D. Roberts

{"title":"Learning something from nothing: Leveraging implicit human feedback strategies","authors":"R. Loftin, Bei Peng, J. MacGlashan, M. Littman, Matthew E. Taylor, Jeff Huang, D. Roberts","doi":"10.1109/ROMAN.2014.6926319","DOIUrl":null,"url":null,"abstract":"In order to be useful in real-world situations, it is critical to allow non-technical users to train robots. Existing work has considered the problem of a robot or virtual agent learning behaviors from evaluative feedback provided by a human trainer. That work, however, has treated feedback as a numeric reward that the agent seeks to maximize, and has assumed that all trainers will provide feedback in the same way when teaching the same behavior. We report the results of a series of user studies that indicate human trainers use a variety of approaches to providing feedback in practice, which we describe as different “training strategies.” For example, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa. If the trainer is consistent in their strategy, then it may be possible to infer knowledge about the desired behavior from cases where no explicit feedback is provided. We discuss a probabilistic model of human-provided feedback that can be used to classify these different training strategies based on when the trainer chooses to provide explicit reward and/or explicit punishment, and when they choose to provide no feedback. Additionally, we investigate how training strategies may change in response to the appearance of the learning agent. Ultimately, based on this work, we argue that learning agents designed to understand and adapt to different users' training strategies will allow more efficient and intuitive learning experiences.","PeriodicalId":235810,"journal":{"name":"The 23rd IEEE International Symposium on Robot and Human Interactive Communication","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 23rd IEEE International Symposium on Robot and Human Interactive Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROMAN.2014.6926319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

Abstract

In order to be useful in real-world situations, it is critical to allow non-technical users to train robots. Existing work has considered the problem of a robot or virtual agent learning behaviors from evaluative feedback provided by a human trainer. That work, however, has treated feedback as a numeric reward that the agent seeks to maximize, and has assumed that all trainers will provide feedback in the same way when teaching the same behavior. We report the results of a series of user studies that indicate human trainers use a variety of approaches to providing feedback in practice, which we describe as different “training strategies.” For example, users may not always give explicit feedback in response to an action, and may be more likely to provide explicit reward than explicit punishment, or vice versa. If the trainer is consistent in their strategy, then it may be possible to infer knowledge about the desired behavior from cases where no explicit feedback is provided. We discuss a probabilistic model of human-provided feedback that can be used to classify these different training strategies based on when the trainer chooses to provide explicit reward and/or explicit punishment, and when they choose to provide no feedback. Additionally, we investigate how training strategies may change in response to the appearance of the learning agent. Ultimately, based on this work, we argue that learning agents designed to understand and adapt to different users' training strategies will allow more efficient and intuitive learning experiences.

查看原文本刊更多论文

从无到有:利用隐含的人类反馈策略

为了在现实世界中发挥作用，允许非技术用户训练机器人是至关重要的。现有的工作已经考虑了机器人或虚拟代理从人类训练师提供的评估反馈中学习行为的问题。然而，这项工作将反馈视为智能体寻求最大化的数字奖励，并假设所有训练者在教授相同行为时都会以相同的方式提供反馈。我们报告了一系列用户研究的结果，这些结果表明人类训练师在实践中使用各种方法来提供反馈，我们将其描述为不同的“训练策略”。例如，用户可能并不总是对某个行为做出明确的反馈，可能更倾向于提供明确的奖励而不是明确的惩罚，反之亦然。如果培训师的策略是一致的，那么在没有提供明确反馈的情况下，就有可能推断出期望行为的知识。我们讨论了人类提供反馈的概率模型，该模型可用于根据训练者何时选择提供明确的奖励和/或明确的惩罚，以及何时选择不提供反馈，对这些不同的训练策略进行分类。此外，我们还研究了训练策略如何随着学习代理的出现而变化。最终，基于这项工作，我们认为设计用于理解和适应不同用户训练策略的学习代理将允许更有效和直观的学习体验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 23rd IEEE International Symposium on Robot and Human Interactive Communication

自引率

0.00%

发文量