An intrinsic reward for affordance exploration

2009 IEEE 8th International Conference on Development and Learning Pub Date : 2009-06-05 DOI:10.1109/DEVLRN.2009.5175542

Stephen Hart

引用次数: 16

Abstract

In this paper, we present preliminary results demonstrating how a robot can learn environmental affordances in terms of the features that predict successful control and interaction. We extend previous work in which we proposed a learning framework that allows a robot to develop a series of hierarchical, closed-loop manipulation behaviors. Here, we examine a complementary process where the robot builds probabilistic models about the conditions under which these behaviors are likely to succeed. To accomplish this, we present an intrinsic reward function that directs the robot's exploratory behavior towards gaining confidence in these models. We demonstrate how this single intrinsic motivator can lead to artifacts of behavior such as “novelty,” “habituation,” and “surprise.” We present results using the bimanual robot Dexter, and explore these results further in simulation.

查看原文本刊更多论文

对功能探索的内在奖励

在本文中，我们展示了初步结果，展示了机器人如何从预测成功控制和交互的特征方面学习环境可视性。我们扩展了之前的工作，我们提出了一个学习框架，允许机器人开发一系列分层的闭环操作行为。在这里，我们研究了一个互补的过程，在这个过程中，机器人建立了关于这些行为可能成功的条件的概率模型。为了实现这一点，我们提出了一个内在奖励函数，指导机器人的探索行为，以获得对这些模型的信心。我们展示了这个单一的内在动机是如何导致诸如“新奇”、“习惯化”和“惊喜”等行为的人工制品的。我们介绍了使用双手机器人Dexter的结果，并在仿真中进一步探讨了这些结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE 8th International Conference on Development and Learning

自引率

0.00%

发文量