TAMER:通过评估强化手动训练Agent

W. B. Knox, P. Stone
{"title":"TAMER:通过评估强化手动训练Agent","authors":"W. B. Knox, P. Stone","doi":"10.1109/DEVLRN.2008.4640845","DOIUrl":null,"url":null,"abstract":"Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agentpsilas learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agentpsilas observed actions. Specifically, in sequential decision making tasks, an agent models the humanpsilas reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainerspsila feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.","PeriodicalId":366099,"journal":{"name":"2008 7th IEEE International Conference on Development and Learning","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"136","resultStr":"{\"title\":\"TAMER: Training an Agent Manually via Evaluative Reinforcement\",\"authors\":\"W. B. Knox, P. Stone\",\"doi\":\"10.1109/DEVLRN.2008.4640845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agentpsilas learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agentpsilas observed actions. Specifically, in sequential decision making tasks, an agent models the humanpsilas reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainerspsila feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.\",\"PeriodicalId\":366099,\"journal\":{\"name\":\"2008 7th IEEE International Conference on Development and Learning\",\"volume\":\"151 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"136\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 7th IEEE International Conference on Development and Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2008.4640845\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 7th IEEE International Conference on Development and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2008.4640845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 136

摘要

尽管计算机在许多任务上已经超过了人类,尤其是计算密集型的任务,但仍有许多任务需要人类的专业知识和/或有用。对于这样的任务,人类希望能够尽可能快速和轻松地将知识传递给学习代理,理想情况下,不需要了解代理学习过程的任何细节。本文提出了一个通用框架,称为通过评估强化(TAMER)手动训练智能体,它允许人类通过简单地给予标量奖励信号来响应智能体观察到的动作,从而训练一个学习智能体执行一类常见的复杂任务。具体来说,在顺序决策任务中,智能体模拟人类的奖励函数,并选择它预测将获得最多奖励的行为。我们的新算法在游戏《俄罗斯方块》中得到了充分的实现和测试。利用人类训练器的反馈,智能体在第三局游戏中平均清除了50多行,比最好的自主学习智能体快了一个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
TAMER: Training an Agent Manually via Evaluative Reinforcement
Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agentpsilas learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agentpsilas observed actions. Specifically, in sequential decision making tasks, an agent models the humanpsilas reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainerspsila feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信