{"title":"从像素到策略:一个引导代理","authors":"J. Stober, B. Kuipers","doi":"10.1109/DEVLRN.2008.4640813","DOIUrl":null,"url":null,"abstract":"An embodied agent senses the world at the pixel level through a large number of sense elements. In order to function intelligently, an agent needs high-level concepts, grounded in the pixel level. For human designers to program these concepts and their grounding explicitly is almost certainly intractable, so the agent must learn these foundational concepts autonomously. We describe an approach by which an autonomous learning agent can bootstrap its way from pixel-level interaction with the world, to individuating and tracking objects in the environment, to learning an effective policy for its behavior. We use methods drawn from computational scientific discovery to identify derived variables that support simplified models of the dynamics of the environment. These derived variables are abstracted to discrete qualitative variables, which serve as features for temporal difference learning. Our method bridges the gap between the continuous tracking of objects and the discrete state representation necessary for efficient and effective learning. We demonstrate and evaluate this approach with an agent experiencing a simple simulated world, through a sensory interface consisting of 60,000 time-varying binary variables in a 200 x 300 array, plus a three-valued motor signal and a real-valued reward signal.","PeriodicalId":366099,"journal":{"name":"2008 7th IEEE International Conference on Development and Learning","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"From pixels to policies: A bootstrapping agent\",\"authors\":\"J. Stober, B. Kuipers\",\"doi\":\"10.1109/DEVLRN.2008.4640813\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An embodied agent senses the world at the pixel level through a large number of sense elements. In order to function intelligently, an agent needs high-level concepts, grounded in the pixel level. For human designers to program these concepts and their grounding explicitly is almost certainly intractable, so the agent must learn these foundational concepts autonomously. We describe an approach by which an autonomous learning agent can bootstrap its way from pixel-level interaction with the world, to individuating and tracking objects in the environment, to learning an effective policy for its behavior. We use methods drawn from computational scientific discovery to identify derived variables that support simplified models of the dynamics of the environment. These derived variables are abstracted to discrete qualitative variables, which serve as features for temporal difference learning. Our method bridges the gap between the continuous tracking of objects and the discrete state representation necessary for efficient and effective learning. We demonstrate and evaluate this approach with an agent experiencing a simple simulated world, through a sensory interface consisting of 60,000 time-varying binary variables in a 200 x 300 array, plus a three-valued motor signal and a real-valued reward signal.\",\"PeriodicalId\":366099,\"journal\":{\"name\":\"2008 7th IEEE International Conference on Development and Learning\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 7th IEEE International Conference on Development and Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2008.4640813\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 7th IEEE International Conference on Development and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2008.4640813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
摘要
具身智能体通过大量的感知元素在像素级感知世界。为了智能地工作,代理需要基于像素级别的高级概念。对于人类设计师来说,要对这些概念及其明确的基础进行编程几乎肯定是棘手的,因此智能体必须自主学习这些基本概念。我们描述了一种方法,通过这种方法,自主学习代理可以从与世界的像素级交互,到个性化和跟踪环境中的对象,再到为其行为学习有效的策略。我们使用从计算科学发现中得出的方法来识别支持环境动力学简化模型的派生变量。这些衍生变量被抽象为离散的定性变量,作为时间差分学习的特征。我们的方法弥合了对象的连续跟踪和高效学习所需的离散状态表示之间的差距。我们通过一个由200 x 300数组中的60,000个时变二进制变量组成的感官接口,加上一个三值运动信号和一个实值奖励信号,通过一个智能体体验一个简单的模拟世界来演示和评估这种方法。
An embodied agent senses the world at the pixel level through a large number of sense elements. In order to function intelligently, an agent needs high-level concepts, grounded in the pixel level. For human designers to program these concepts and their grounding explicitly is almost certainly intractable, so the agent must learn these foundational concepts autonomously. We describe an approach by which an autonomous learning agent can bootstrap its way from pixel-level interaction with the world, to individuating and tracking objects in the environment, to learning an effective policy for its behavior. We use methods drawn from computational scientific discovery to identify derived variables that support simplified models of the dynamics of the environment. These derived variables are abstracted to discrete qualitative variables, which serve as features for temporal difference learning. Our method bridges the gap between the continuous tracking of objects and the discrete state representation necessary for efficient and effective learning. We demonstrate and evaluate this approach with an agent experiencing a simple simulated world, through a sensory interface consisting of 60,000 time-varying binary variables in a 200 x 300 array, plus a three-valued motor signal and a real-valued reward signal.