From pixels to policies: A bootstrapping agent

J. Stober, B. Kuipers
{"title":"From pixels to policies: A bootstrapping agent","authors":"J. Stober, B. Kuipers","doi":"10.1109/DEVLRN.2008.4640813","DOIUrl":null,"url":null,"abstract":"An embodied agent senses the world at the pixel level through a large number of sense elements. In order to function intelligently, an agent needs high-level concepts, grounded in the pixel level. For human designers to program these concepts and their grounding explicitly is almost certainly intractable, so the agent must learn these foundational concepts autonomously. We describe an approach by which an autonomous learning agent can bootstrap its way from pixel-level interaction with the world, to individuating and tracking objects in the environment, to learning an effective policy for its behavior. We use methods drawn from computational scientific discovery to identify derived variables that support simplified models of the dynamics of the environment. These derived variables are abstracted to discrete qualitative variables, which serve as features for temporal difference learning. Our method bridges the gap between the continuous tracking of objects and the discrete state representation necessary for efficient and effective learning. We demonstrate and evaluate this approach with an agent experiencing a simple simulated world, through a sensory interface consisting of 60,000 time-varying binary variables in a 200 x 300 array, plus a three-valued motor signal and a real-valued reward signal.","PeriodicalId":366099,"journal":{"name":"2008 7th IEEE International Conference on Development and Learning","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 7th IEEE International Conference on Development and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2008.4640813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

Abstract

An embodied agent senses the world at the pixel level through a large number of sense elements. In order to function intelligently, an agent needs high-level concepts, grounded in the pixel level. For human designers to program these concepts and their grounding explicitly is almost certainly intractable, so the agent must learn these foundational concepts autonomously. We describe an approach by which an autonomous learning agent can bootstrap its way from pixel-level interaction with the world, to individuating and tracking objects in the environment, to learning an effective policy for its behavior. We use methods drawn from computational scientific discovery to identify derived variables that support simplified models of the dynamics of the environment. These derived variables are abstracted to discrete qualitative variables, which serve as features for temporal difference learning. Our method bridges the gap between the continuous tracking of objects and the discrete state representation necessary for efficient and effective learning. We demonstrate and evaluate this approach with an agent experiencing a simple simulated world, through a sensory interface consisting of 60,000 time-varying binary variables in a 200 x 300 array, plus a three-valued motor signal and a real-valued reward signal.
从像素到策略:一个引导代理
具身智能体通过大量的感知元素在像素级感知世界。为了智能地工作,代理需要基于像素级别的高级概念。对于人类设计师来说,要对这些概念及其明确的基础进行编程几乎肯定是棘手的,因此智能体必须自主学习这些基本概念。我们描述了一种方法,通过这种方法,自主学习代理可以从与世界的像素级交互,到个性化和跟踪环境中的对象,再到为其行为学习有效的策略。我们使用从计算科学发现中得出的方法来识别支持环境动力学简化模型的派生变量。这些衍生变量被抽象为离散的定性变量,作为时间差分学习的特征。我们的方法弥合了对象的连续跟踪和高效学习所需的离散状态表示之间的差距。我们通过一个由200 x 300数组中的60,000个时变二进制变量组成的感官接口,加上一个三值运动信号和一个实值奖励信号,通过一个智能体体验一个简单的模拟世界来演示和评估这种方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信